GRANT  NUMBER  DAMD17-96-1-6058 


TITLE:  Advanced  Methods  for  the  Computer-Aided  Diagnosis  of 

Lesions  in  Digital  Mammograms 


PRINCIPAL  INVESTIGATION:  Maryellen  L.  Giger,  Ph.D. 


CONTRACTING  ORGANIZATION:  University  of  Chicago 

Chicago,  Illinois  60637 


REPORT  DATE:  July  1999 


TYPE  OF  REPORT:  Annual 


PREPARED  FOR: 

U.S.  Army  Medical  Research  and  Materiel  Command 
Fort  Detrick,  Frederick,  Maryland  21702-5012 


DISTRIBUTION  STATEMENT:  Approved  for  public  release; 

distribution  unlimited 


The  views,  opinions  and/or  findings  contained  in  this  report  are 
those  of  the  author (s)  and  should  not  be  construed  as  an  official 
Department  of  the  Army  position,  policy  or  decision  unless  so 
designated  by  other  documentation. 


20010122  090 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources, 
gathering  and  maintaining  the  data  needed,,  and  completing  and  reviewing  the  collection  of  information  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this 


collection  of  information,  including  suggest 
Davis  Highway,  Suite  1204,  Arlington,  VA 


estions  for  reducing  this  burden 


22202-4302,  and  to  the 


Jen.  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson 
Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704^0188),  Washington,  DC  20503. 


1.  AGENCY  USE  ONLY  (Leave  blank) 


2.  REPORT  DATE  3.  REPORT  TYPE  AND  DATES  COVERED 

July  1999  Annual  (7  Jun  98-6  Jun  99) 


4.  TITLE  AND  SUBTITLE  5.  FUNDING  NUMBERS 

Advanced  Methods  for  the  Computer-Aided  Diagnosis  of 

Lesions  in  Digital  Mammograms  DAMD17-96-1-6058 


6.  AUTHOR(S) 

Maryellen  L.  Giger,  Ph.D. 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

University  of  Chicago 
Chicago,  Illinois  60637 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

U.S.  Army  Medical  Research  and  Materiel  Command 
Fort  Detrick,  Frederick,  MD  21702-5012 


10.  SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 


12a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 


12b.  DISTRIBUTION  CODE 


Approved  for  public  release;  distribution  unlimited 


13.  ABSTRACT  (Maximum  200 

The  objective  of  th  eproposed  research  is  to  develop  computer-aided  diagnosis  methods 
for  use  in  mammography  in  order  to  increase  the  diagnostic  sccuracy  of  radiologists 
and  to  aid  in  mammo graphic  screening  programs.  We  have  increased  the  detection 
accuracy  of  our  computerized  method  by  incorporating  temporal  information  and 
by  developing  a  new  single- image  detection  method.  We  have  also  investigated 
methods  for  feature  selection  and  feature  merging  (classifiers)  when  only  limited 
datasets  are  available.  We  have  also  continued  to  implement  and  investigate 
the  use  of  out  clinical  prototype  intelligent  workstation. 


14.  SUBJECT  TERMS  Breast  Cancer 

computer-aided  diagnosis,  screening 


15.  NUMBER  OF  PAGES 

34 


16.  PRICE  CODE 


17.  SECURITY  CLASSIFICATION  18.  SECURITY  CLASSIFICATION  19.  SECURITY  CLASSIFICATION  20.  LIMITATION  OF  ABSTRAC 
OF  REPORT  OF  THIS  PAGE  OF  ABSTRACT 


Unclassified 


NSN  7540-01-280-5500 


Unclassified 


Unclassified 


Unlimited 


Standard  Form  298  (Rev.  2-89) 

Prescribed  by  ANSI  Std.  239-18 
298-102 


FOREWORD 


Opinions,  interpretations,  conclusions  and  recommendations  are 
those  of  the  author  and  are  not  necessarily  endorsed  by  the  U.S. 
Army. 

_  Where  copyrighted  material  is  quoted,  permission  has  been 

obtained  to  use  such  material . 

_  Where  material  from  documents  designated  for  limited 

distribution  is  quoted,  permission  has  been  obtained  to  use  the 
material . 

_  Citations  of  commercial  organizations  and  trade  names  in 

this  report  do  not  constitute  an  official  Department  of  Army 
endorsement  or  approval  of  the  products  or  services  of  these 
organizations . 

_  In  conducting  research  using  animals,  the  investigator (s) 

adhered  to  the  "Guide  for  the  Care  and  Use  of  Laboratory 
Animals,"  prepared  by  the  Committee  on  Care  and  use  of  Laboratory 
Animals  of  the  Institute  of  Laboratory  Resources,  national 
Research  Council  (NIH  Publication  No.  86-23,  Revised  1985) . 

For  the  protection  of  human  subjects,  the  investigator (s) 
adhered  to  policies  of  applicable  Federal  Law  45  CFR  46. 

_  In  conducting  research  utilizing  recombinant  DNA  technology, 

the  investigator (s)  adhered  to  current  guidelines  promulgated  by 
the  National  Institutes  of  Health. 

_  In  the  conduct  of  research  utilizing  recombinant  DNA,  the 

investigator (s)  adhered  to  the  NIH  Guidelines  for  Research 
Involving  Recombinant  DNA  Molecules . 

_  In  the  conduct  of  research  involving  hazardous  organisms, 

the  investigator (s)  adhered  to  the  CDC-NIH  Guide  for  Biosafety  in 
Microbiological  and  Biomedical  Laboratories . 


Annual  Report  D AMD  1 7 -96- 1-6058  4 

Table  of  Contents 

Page 

FRONT  COVER . 1 

STANDARD  FORM  (SF  298) . 2 

FOREWORD . 3 

INTRODUCTION  . 5 

BODY  . 6 

KEY  RESEARCH  ACCOMPLISHMENTS  . 11 

REPORTABLE  OUTCOMES  . 11 


Annual  Report  DAMD  17-96-1-6058 


5 


INTRODUCTION 

Our  first-year  report  was  accepted  as  an  excellent  report  as  submitted.  The  review  of  our 
first  year  report  indicated  that  the  report  was  well-written  with  extensive  background 
material  and  meticulous  description  of  the  theoretical  basis  for  the  algorithms,  which  are 
not  necessary  in  future  annual  reports  and  could  be  referred  to  with  appropriate  citations. 
Thus,  in  this  third-year  report,  we  have  substantially  shortened  the  background  sections 
and  refer  the  reviewer  to  our  first-year  report,  especially  now  that  the  report  be  2-5  pages 
only. 

Although  mammography  is  currently  the  best  method  for  the  detection  of  breast  cancer,  between 
10-30%  of  women  who  have  breast  cancer  and  undergo  mammography  have  negative  mammograms. 
In  approximately  two-thirds  of  these  false-negative  mammograms,  the  radiologist  failed  to  detect  the 
cancer  that  was  evident  retrospectively.  Low  conspicuity  of  the  lesion,  eye  fatigue  and  inattentiveness 
are  possible  causes  for  these  misses.  We  believe  that  the  effectiveness  (early  detection)  and  efficiency 
(rapid  diagnosis)  of  screening  procedures  could  be  increased  substantially  by  use  of  a  computer 
system  that  successfully  aids  the  radiologist  by  indicating  locations  of  suspicious  abnormalities  in 
mammograms.  In  addition,  many  breast  cancers  are  detected  and  referred  for  surgical  biopsy  on  the 
basis  of  a  radiographically  detected  mass  lesion  or  cluster  of  microcalcifications.  Although  general 
rules  for  the  differentiation  between  benign  and  malignant  breast  lesions  exist,  considerable 
misclassification  of  lesions  occurs  with  the  current  methods.  On  average,  only  10-30%  of  masses 
referred  for  surgical  breast  biopsy  are  actually  malignant.  Surgical  biopsy  is  an  invasive  technique  that 
is  an  expensive  and  traumatic  experience  for  the  patient  and  leaves  physical  scars  that  may  hinder  later 
diagnoses  (to  the  extent  of  requiring  repeat  biopsies  for  a  radiographic  tumor-simulating  scar).  A 
computerized  method  capable  of  detecting  and  analyzing  the  characteristics  of  benign  and  malignant 
masses,  in  an  objective  manner,  should  aid  radiologists  by  reducing  the  numbers  of  false-positive 
diagnoses  of  malignancies,  thereby  decreasing  patient  morbidity  as  well  as  the  number  of  surgical 
biopsies  performed  and  their  associated  complications. 


Purpose  of  the  present  work 

The  main  hypothesis  to  be  tested  is  that  given  dedicated  computer-vision  programs  for  the 
computer-assisted  interpretation  of  mammograms,  the  diagnostic  accuracy  for  mammographic 
interpretation  will  be  improved,  yielding  earlier  detection  of  breast  cancer  (i.e.,  a  reduction  in  the 
number  of  missed  lesions)  and  a  reduction  in  the  number  of  benign  cases  sent  to  biopsy.  Computer- 
aided  diagnosis  (CAD)  can  be  defined  as  a  diagnosis  made  by  a  radiologist  who  takes  into 
consideration  the  results  of  a  computerized  analysis  of  radiographic  images  and  uses  them  as  a 
"second  opinion"  in  detecting  lesions  and  in  making  diagnostic  decisions.  The  final  diagnosis  would 
be  made  by  the  radiologist. 

Methods  of  approach 

The  objective  of  the  proposed  research  is  to  develop  computer-aided  diagnosis  methods  for  use  in 
mammography  in  order  to  increase  the  diagnostic  decision  accuracy  of  radiologists  and  to  aid  in 
mammographic  screening  programs.  The  CAD  methods  will  include  a  parallel  method  for  the 
detection  of  a  range  of  mass  types  and  for  the  incorporation  of  information  from  multiple  views  (i.e., 
CC  and  MLO,  and  prior  mammograms). 

The  specific  objectives  of  the  research  to  be  addressed  are: 

(1)  Development  of  advanced  computerized  schemes  for  the  detection  and  classification  of  masses  in 
digital  mammograms. 

(a)  Development  of  a  computerized  detection  scheme  for  spiculated  lesions  and  architectural 
distortions  based  on  the  calculation  of  the  Hough  spectrum. 
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(b)  Development  of  a  computerized  detection  scheme  for  small,  low-contrast  early  cancers  based 
on  gradient  and  circularity  filters. 

(c)  Incorporation  of  the  two  new  methods  with  a  previously-developed  bilateral-subtraction 
method  along  with  feature  analyses  into  a  system  for  lesion  detection. 

(d)  Further  development  of  computerized  classification  schemes  for  masses. 

(2)  Development  of  computerized  methods  based  on  multiple  views  for  enhanced  mammographic 
interpretation. 

(a)  Development  of  computerized  methods  for  the  incorporation  of  image  information  from  the 
CC  and  MLO  views  of  mammographic  examinations. 

(b)  Development  of  computerized  methods  for  analysis  of  temporal  change  between 
mammographic  examinations. 

(3)  Incorporation  of  the  computer-vision  methods  with  an  Mammo/Icon  mammographic  review 
system  for  enhanced  diagnosis. 

(a)  Expansion  of  the  Mammo/Icon  database  descriptors  to  include  CAD  derived  parameters. 

(b)  Calculation  of  the  computer  extracted  features  of  images  in  the  Mammo/Icon  database. 

(c)  Development  of  hardware  and  software  interfaces  for  CAD  and  Mammo/Icon. 

(4)  Evaluation  of  the  CAD  methods  for  mammography. 


BODY 

Development  of  advanced  computerized  schemes  for  the  detection  of  masses  in  digital 
mammograms.  Results  to  date 

With  the  single-image  method  for  detection  of  small  invasive  breast  cancers  localized  density 
peaks  on  mammograms  are  identified  using  a  gradient/circularity  filter.  Lesion  contours  were 
generated  by  matching  a  deformable  template  onto  a  second  derivative  edge  map.  In  a  preliminary 
study  (without  further  feature  analyses  to  reduce  false  positives)  using  45  non-palpable  invasive  breast 
cancers,  all  with  a  size  less  than  1  cm  (median  size  of  7  mm),  82%  of  the  cancers  were  detected  with  an 
average  false-positive  rate  of  2.8  per  image. 

In  the  Hough  spectrum  geometric  texture  analysis  technique,  the  mammogram  is  analyzed  ROI  by 
ROI.  Each  ROI  is  transformed  into  its  Hough  spectrum  and  then  thresholding  is  performed  with  its 
threshold  level  based  on  the  statistical  properties  of  the  spectrum.  ROIs  with  strong  signals  of 
spiculation  are  then  screened  out  as  regions  of  potential  lesions.  In  a  preliminary  study,  32  images 
containing  spiculated  lesions/architectural  distortions  (biopsy  confirmed)  were  analyzed  using 
information  extracted  from  the  Hough  spectrum.  Our  preliminary  studies,  using  only  the  Hough 
spectrum  based  technique  without  further  feature  analyses  to  reduce  false  positives,  yielded 
sensitivities  of  81%  for  spiculated  masses  and  67%  for  architectural  distortions  at  false  positives  rates 
of  0.97  and  2.2  per  image,  respectively.  We  have  also  converting  the  method  into  an  AVS  based 
program  to  expedite  the  development  and  optimization  of  the  parameters  such  as  ROI  size. 

Output  from  the  bilateral  subtraction  method  and  that  of  the  gradient/circularity  filtering  were 
combined  and  analyzed.  Many  masses  were  detected  by  both  preprocessing  methods.  For  a  database 
of  20  cancer  cases,  the  bilateral  yielded  a  sensitivity  of  75%  (at  1.8  false-positives  detections  per 
image)  and  the  gradient/circularity  filter  yielded  a  sensitivity  of  70%  at  the  same  false  postivie  rate. 
Upon  comparison,  the  gradient/circularity  filter  found  masses  that  the  bilateral  did  not,  thus  allowing 
the  sensitivity  to  increase  to  80%.  We  are  currently  comparing  the  false  positive  overlap  to  determine 
the  false-positive  rate  for  the  combined  scheme  as  well  as  give  us  a  means  to  improve  the  false-positive 
rate  while  optimizing  the  sensitivity  for  each  method. 

Since  November  8, 1994,  all  screening  mammograms  taken  at  the  University  of  Chicago  Hospitals 
have  been  analyzed  on  our  clinical  prototype  mammography  worstation,  except  during  downtimes. 
Downtime  has  been  minimal,  less  than  20  days  in  total,  which  includes  a  3-week  period  when  the 
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mammography  section  moved  to  a  new  outpatient  center.  During  that  move,  networking  problems  in 
the  new  facility  contributed  to  computer  system  difficulties.  For  cases  in  which  a  cancer  was  detected, 
we  also  retrospectively  reviewed  any  previous  mammograms  that  were  in  our  study  cohort.  Two 
radiologists  independently  reviewed  the  cases  and  stated  whether  the  cancer  was  visible  in  a  previous 
exam  and  whether,  knowing  that  the  lesion  was  present,  that  they  would  call  the  patient  back  for  a 
diagnostic  exam  based  on  the  findings  in  the  previous  exam.  In  this  way,  the  number  of  cancers 
detected  by  the  computer  that  were  initially  missed  by  the  radiologists  was  determined.  As  of  May,  1, 
1998,  over  14,000  cases  have  been  analyzed.  With  follow-up  on  the  first  10,000  cases,  61  patients 
have  been  diagnosed  with  breast  cancer.  In  12  of  these  cases,  the  screening  mammogram(s)  were 
negative  even  in  retrospect.  For  the  mammographically  visible  cases  (n=49),  the  sensitivity  of  the  two 
schemes  was  68%  (34/49).  Clinically,  96%  of  the  cancers  were  detected  (47/49).  More  important 
than  the  absolute  sensitivity  of  the  workstation  is  its  ability  to  detect  breast  cancers  that  may  be  missed 
by  a  radiologist.  In  30  of  the  61  cancers,  the  patient  had  a  screening  exam  that  was  read  as  negative 
and  was  included  in  our  study.  That  is,  a  screening  mammogram  that  was  read  as  normal,  which 
preceded  the  cancer  being  diagnosed.  In  14  of  these  cases,  no  lesion  could  be  seen  in  retrospect,  i.e., 
mammographically  negative.  In  9  of  16  cases,  the  computer  was  able  to  identify  the  region  on  the 
negative-read  (cancer  visible  in  retrospect)  screening  mammogram  that  corresponded  to  where  the 
cancer  was  subsequently  detected.  Overall,  the  computer  was  able  to  identify  the  cancer  approximately 
one  year  before  it  was  diagnosed  in  approximately  15%  (9/61)  of  all  cancer  cases  and  in  56%  (9/16) 
of  cases  were  the  cancer  was  visible  in  retrospect  on  a  negative-read  screening  mammogram.  The 
false-positive  rate  was  approximately  1.3  false  clusters  per  image  and  2.1  false  masses  per  image.  The 
types  of  false-positive  detections  found  by  the  computer  in  mass  detection  and  clustered 
microcalcification  detection  were  investigated  for  1296  cases.  Of  the  false  positives  that  were  indicated 
by  the  computer,  over  80%  of  the  mass  false  positives  were  due  to  nodular  densities  on  the  film. 

In  order  to  determine  the  effect  of  false-positive  detections  on  mammographic  interpretation,  we 
calculated  the  call-back  rate  in  one-year  time  periods  before  and  after  implementation  of  the 
workstation  in  the  clinical  area.  The  callback  rate  is  the  fraction  of  screening  mammograms  read  as 
abnormal.  Before  introduction  of  CAD,  13.2%  of  screeners  were  called  back  for  further  workup  and 
after  the  introduction  of  CAD,  12.6%  of  screeners  were  called  back  for  further  workup.  Thus,  the 
false-positive  output  from  the  computer  did  not  increase  the  number  of  women  called  back. 

A  new  development,  which  is  now  being  implemented  into  the  various  detection  and  classification 
schemes  for  mammographic  masses,  is  a  new  region  growing  algorithm.  The  segmentation  of  lesions 
from  surrounding  background  is  a  vital  step  in  many  computerized  mass  detection  schemes.  We  have 
developed  two  novel  lesion  segmentation  techniques  -  one  based  on  a  single  feature  called  the  radial 
gradient  index  (similar  feature  to  that  described  above)  and  one  based  on  a  simple  probabilistic  model 
to  segment  mass  lesions  from  surrounding  background.  In  both  methods  a  series  of  image  partitions 
is  created  using  gray-level  information  as  well  as  prior  knowledge  of  the  shape  of  typical  mass  lesions. 
With  the  former  method  the  partition  that  maximizes  the  radial  gradient  index  is  selected.  In  the  latter 
method,  probability  distributions  for  gray-levels  inside  and  outside  the  partitions  are  estimated,  and 
subsequently  used  to  determine  the  probability  that  the  image  occurred  for  each  given  partition.  The 
partition  that  maximizes  this  probability  is  selected  as  the  final  lesion  partition  (contour).  We  tested 
these  methods  against  our  previous  region-growing  algorithm  using  a  database  of  biopsy-proven, 
malignant  lesions  and  found  that  the  new  lesion  segmentation  algorithms  more  closely  match 
radiologists'  outlines  of  these  lesions.  At  an  overlap  threshold  of  0.30,  gray  level  region  growing 
correctly  delineates  62%  of  the  lesions  in  our  database  while  the  radial  gradient  index  (RGI)  algorithm 
and  the  probabilistic  segmentation  algorithm  correctly  segment  92%  and  96%  of  the  lesions, 
respectively.  With  these  new  segmentation  results  we  hope  to  find  and  extract  new  features  that  will 
help  differential  between  actual  lesions  and  false-positive  detections,  thus  improving  the  overall 
performance  of  computerized  mass  detection. 

A  new  extension  of  the  region  growing  method  was  developed.  The  radial  gradient  index  (RGI) 
region  growing  method  is  now  being  implemented  at  the  very  first  stage  of  the  mass  detection 


Annual  Report  DAMD  17-96-1-6058 


8 


algorithm  in  order  to  increase  the  sensitivity  for  mass  detection.  Thus,  this  RGI  algorithm  replaces  the 
bilateral  subtraction  methodology  in  the  overall  computerized  mass  detection  method.  The  benefit  of 
this  change  is  that  cases  with  unilateral  mammograms  can  be  analyzed  by  the  computer  method.  In 
addition,  the  sensitivity  of  the  detection  algorithm  increased  by  15%. 

In  order  to  improve  the  classifier  performance  in  the  detection  method  for  distinguishing  between 
actual  lesions  and  false-positive  detections,  we  investigated  feature  selection  with  limited  datasets  and 
the  use  of  probabilistic  artificial  neural  networks.  In  many  computerized  schemes,  numerous  features 
can  be  extracted  to  describe  suspect  image  regions.  A  subset  of  these  features  is  then  employed  in  a 
data  classifier  to  determine  whether  the  suspect  region  is  abnormal  or  normal.  Different  subsets  of 
features  will,  in  general,  result  in  different  classification  performances.  A  feature  selection  method  is 
often  used  to  determine  an  "optimal"  subset  of  features  to  use  with  a  particular  classifier.  A  classifier 
performance  measure  (such  as  the  area  under  the  receiver  operating  characteristic  (ROC)  curve)  must 
be  incorporated  into  this  feature  selection  process.  With  limited  datasets,  however,  there  is  a 
distribution  in  the  classifier  performance  measure  for  a  given  classifier  and  subset  of  features.  We 
investigated  the  variation  in  the  selected  subset  of  "optimal"  features  as  compared  with  the  true  optimal 
subset  of  features  caused  by  this  distribution  of  classifier  performance.  We  considered  examples  in 
which  the  probability  that  the  optimal  subset  of  features  is  selected  can  be  analytically  computed.  We 
showed  the  dependence  of  this  probability  on  the  dataset  sample  size,  the  total  number  of  features  from 
which  to  select,  the  number  of  features  selected,  and  the  performance  of  the  true  optimal  subset.  Once 
a  subset  of  features  has  been  selected,  the  parameters  of  the  data  classifier  must  be  determined.  We 
showed  that,  with  limited  datasets  and/or  a  large  number  of  features  from  which  to  choose,  bias  is 
introduced  if  the  classifier  parameters  are  determined  using  the  same  data  that  were  employed  to  select 
the  "optimal"  subset  of  features. 

It  is  well  understood  that  the  optimal  classification  decision  variable  is  the  likelihood  ratio  or  any 
monotonic  transformation  of  the  likelihood  ratio.  An  automated  classifier  which  maps  from  an  input 
space  to  one  of  the  likelihood  ratio  family  of  decision  variables  is  an  optimal  classifier  or  an  ideal 
observer.  Artificial  neural  networks  (ANNs)  are  frequently  used  as  classifiers  for  many  problems.  In 
the  limit  of  large  sample  sizes,  an  ANN  approximates  a  mapping  function  which  is  a  monotonic 
transformation  of  the  likelihood  ratio,  i.e.,  it  estimates  an  ideal  observer  decision  variable.  The 
disadvantages  of  conventional  ANNs  include  the  potential  over-parameterization  of  the  mapping 
function  which  results  in  a  poor  approximation  of  an  optimal  mapping  function  for  smaller  sample 
sizes.  Recently,  Bayesian  methods  have  been  applied  to  ANNs  in  order  to  regularize  training  to 
improve  the  robustness  of  the  classifier.  A  Bayesian  ANN  should  thus  better  approximate  the  optimal 
decision  variable  given  small  sample  sizes.  We  have  evaluated  the  accuracy  of  Bayesian  ANN  models 
of  ideal  observer  decision  variables  as  a  function  of  the  number  of  hidden  units  used,  the  signal-to- 
noise  ratio  of  the  data,  and  the  number  of  features  or  dimensionality  of  the  data.  We  showed  that  when 
enough  training  data  are  present,  excess  hidden  units  do  not  substantially  degrade  the  accuracy  of 
Bayesian  ANNs.  The  minimum  number  of  hidden  units  required  to  best  model  the  optimal  mapping 
function,  however,  varies  with  the  complexity  of  the  data. 

Development  of  advanced  computerized  schemes  for  the  classification  of  masses  in  digital 
mammograms.  Results  to  date 

We  are  investigating  the  potential  usefulness  of  computer-aided  diagnosis  as  an  aid  to  radiologists 
in  the  characterization  and  classification  of  mass  lesions  in  mammography.  Ninety-five  mammograms 
containing  masses  from  65  patients  were  digitized.  Various  features  related  to  the  margin,  shape  and 
density  of  each  mass  were  extracted  automatically  from  the  neighborhoods  of  the  computer-identified 
mass  regions.  Selected  features  were  merged  into  an  estimated  likelihood  of  malignancy  using  three 
different  automated  classifiers.  The  performance  of  the  three  classifiers  in  distinguishing  between 
benign  and  malignant  masses  was  evaluated  by  receiver  operating  characteristic  (ROC)  analysis,  and 
compared  with  those  of  an  experienced  mammographer  and  of  five  less  experienced  mammographers. 
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Our  computer  classification  scheme  yielded  an  Az  value  of  0.94,  similar  to  that  of  an  experienced 
mammographer  (Az=0.91)  and  statistically  significantly  higher  than  the  average  performance  of  the 
radiologists  with  less  mammographic  experience  (Az=0.80).  With  the  database  we  used,  the  computer 
scheme  achieved,  at  100%  sensitivity,  a  positive  predictive  value  of  83%,  which  was  12%  higher  than 
that  of  the  experienced  mammographer  and  21%  higher  than  that  of  the  average  performance  of  the 
less  experienced  mammographers  at  a  /?-value  of  less  than  0.001.  Thus,  automated  computerized 
classification  schemes  may  be  useful  in  helping  radiologists  distinguish  between  benign  and  malignant 
masses. 

We  have  also  investigated  the  effect  of  dominant  features  on  neural  network  performance  in  the  task 
of  classification  of  mammographic  lesions.  Two  different  classifiers,  an  artificial  neural  network  (ANN) 
and  a  hybrid  system  (one  step  rule-based  method  followed  by  an  artificial  neural  network)  were 
investigated  to  merge  computer-extracted  features  in  the  classification  of  malignant  and  benign  masses. 
Four  computer-extracted  features  were  used  in  the  study:  spiculation,  margin  sharpness  and  two  density- 
related  measures.  ROC  analysis  showed  that  the  hybrid  system  performed  significantly  better  than  the 
ANN  method  at  the  high  sensitivity  levels,  yielding  an  Az  of  0.94  with  a  specificity  of  69%  at  100% 
sensitivity,  whereas,  the  ANN  method  yielded  an  Az  of  0.90  with  a  specificity  of  19%  at  100% 
sensitivity.  To  understand  the  difference  between  the  two  classifiers  in  their  performance,  we 
investigated  their  learning  and  decision-making  processes  by  studying  the  relationships  between  the 
outputs  and  input  features.  The  correlation  study  showed  that  the  outputs  from  the  ANN  alone  method 
strongly  correlated  with  one  of  the  input  features  (spiculation  measure),  yielding  a  correlation  coefficient 
of  0.91  while  the  correlation  coefficients  (absolute  value)  for  the  other  features  range  from  0.19  to  0.40. 
The  strong  correlation  between  the  ANN  output  and  spiculation  measure  indicates  the  learning  and 
decision-making  processes  of  the  ANN  alone  method  was  dominated  by  the  spiculation  measure.  A 
series  of  three-dimensional  plots  of  the  computer  output  as  functions  of  the  input  features  demonstrate 
that  the  ANN  method  did  not  learn  as  effectively  as  the  hybrid  system  from  the  other  three  features  in 
differentiating  subtle  (non-spiculated)  malignant  masses  from  benign  masses,  thus,  resulting  in  the 
inferior  performance  at  the  high  sensitive  levels.  We  found  that  with  a  limited  database,  it  is  detrimental 
for  an  ANN  to  learn  the  significance  of  other  features  in  the  presence  of  a  dominant  feature.  The  hybrid 
system,  which  initially  applied  a  rule  on  the  spiculation  measure  prior  to  an  ANN,  prevents  the  over¬ 
learning  from  the  dominant  feature  and  performed  better  than  the  ANN  alone  method  in  merging  the 
computer-extracted  features  into  a  correct  diagnosis  on  the  malignancy  of  the  masses. 

Currently  in  mammography,  the  digital  image  on  which  CAD  analysis  is  performed  is  obtained  by 
digitizing  a  screen-film  mammogram.  Since  the  image  is  sampled  when  digitized,  the  digitization  of  a 
image  using  two  different  scanners  will  not  produce  exactly  the  same  digital  image  (because  of 
different  designs,  sampling  aperture,  sampling  distance  and  internal  electronic  noise,  etc.  of  the  laser 
scanners  and  the  different  calibration  curves  for  the  transformation  of  the  optical  density  (OD)  to  pixel 
value).  Thus,  the  contrast,  noise  and  resolution  of  the  two  images  may  differ.  Thus,  as  long  as  CAD 
analysis  relies  upon  digitized  screen-film  images,  a  CAD  system  (film  digitization  and  computer 
analysis)  may  suffer  from  the  variability  in  the  digital  formats  of  a  image,  which  may  lead  to  variations 
in  the  performance  of  the  CAD  scheme.  Two  different  databases  and  three  different  digitizers  were 
involved  in  this  study.  One  database  consisted  of  95  mammograms  collected  from  65  cases:  39 
biopsy-confirmed  malignant  cases,  25  biopsy-confirmed  benign  cases  and  one  benign  case  which  was 
determined  through  more  than  five  years  of  follow-up.  These  mammograms  were  digitized  using  an 
optical  drum  scanner  (FTP  II,  Fuji  Film,  Tokyo,  Japan)  at  a  sampling  distance  of  0.1  mm  and  10-bit 
quantization.  Another  database  consisted  of  110  new  cases  which  were  collected  from  the  University 
of  Chicago  Radiology  files.  Of  these,  50  cases  are  biopsy-confirmed  malignant,  50  cases  are  biopsy- 
confirmed  benign  diseases  and  10  cases  are  aspiration-confirmed  cysts.  For  each  case,  two  standard 
views  of  the  affected  breast  were  chosen  from  a  single  screening  exam.  Of  the  1 10  cases,  8  cases  had 
a  mass  appearing  on  one  view  only.  Each  mammogram  this  second  database  was  digitized  twice  using 
two  different  laser  scanners  —  a  Konica  digitizer  (LD  4500;  Konica  Medical,  Wayne,  NJ)  at  0.1-mm 
pixel  size  and  10-bit  quantization  and  a  Lumisys  laser  scanner  (Lumiscan  100,  Lumisys,  Sunnyvale, 
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CA)  at  a  0.1-mm  pixel  size  and  12-bit  quantization.  In  the  evaluation  of  our  classification  scheme, 
both  Az  and  0.9oAz  are  important  indices.  The  Az  value  was  used  to  evaluate  the  overall  performance, 

while  the  partial  area  index  (o.9oAz ) was  designed  to  evaluate  the  performance  of  a  scheme  at  a 
preselected  high  sensitivity  level  for  those  who  are  interested  in  knowing  the  performance  at  the  high 
sensitivity.  In  addition,  the  difference  in  the  partial  area  index  o.9oAz  quantitatively  evaluates,  to  some 
degree,  the  difference  in  the  shape  of  the  two  ROC  curves.  The  differences  in  Az  between  the  two 
digital  formats  were  the  same  for  both  the  ANN-alone  and  hybrid  classifiers.  Two-tailed  p  values 
obtained  from  the  CLABROC  programs  showed  that  the  difference  in  the  performance  of  the 
classification  scheme,  due  to  the  difference  between  the  two  digitization  techniques,  using  both  the 
ANN  and  the  hybrid  classifier  were  not  statistically  significant  at  the  level  of  0.05  in  terms  of  the  Az 

and  o.90Az  • 

In  order  to  observe  the  effect  of  the  computer  aid  on  radiologists’  performance  in  the  task  of 
distinguishing  between  malignant  and  benign  lesions,  we  performed  an  observer  study  at  RSNA  ’98. 
The  mass  classification  method  was  run  on  both  the  MLO  and  CC  views  and  the  magnification  views. 
We  showed  that  the  average  performance  of  128  radiologists  (who  participated  in  the  study)  increased 
significantly  from  an  Az  of  0.89  to  an  Az  of  0.94  (p  <  0.05)  when  the  computer  aid  was  used  in 
distinguishing  20  mass  lesions  cases. 

Development  of  computerized  methods  based  on  multiple  views  for  enhanced 
mammographic  interpretation.  Results  to  date 

We  have  evaluated  the  potential  benefit  of  incorporating  a  temporal  subtraction  scheme  with  our 
bilateral  subtraction  technique  for  improving  the  sensitivity  of  mass  detection.  A  database  of  79  cases 
was  used,  each  of  which  contained  a  lesion  in  at  least  the  current  exam.  Two  methods  for  image 
registration  of  the  temporal  images  were  investigated:  one  used  translation  and  rotation  based  on 
computer-determined  skin  lines  and  the  other  used  a  warping  technique  based  on  the  cross-correlation 
of  regions  of  interest  located  throughout  the  parenchyma.  The  characteristics  of  the  false-positive 
detections  resulting  from  the  bilateral  subtraction  and  from  the  temporal  subtraction  were  analysed. 

The  distribution  of  the  true  positives  and  false  positives  were  similar  despite  the  fact  that  many  of  the 
false  positives  resulting  from  the  two  schemes  were  in  different  locations  in  the  breast  parenchyma.  At 
a  false-positive  rate  of  four  per  image,  the  combined  (Logical  OR)  scheme  detected  85%  of  the  masses, 
which  was  8%  greater  than  the  bilateral  subtraction  technique  alone.  Although  further  work  is  needed 
to  reduce  the  false-positive  rate,  the  combined  use  of  bilateral  and  temporal  subtraction  methods  shows 
potential  for  an  improvement  in  sensitivity  in  the  detection  of  masses. 

We  are  investigating  how  the  lesion  features  as  calculated  from  the  CC,  MLO,  and  magnification 
views  vary.  To  date  we  have  collected  approximately  150  cancer  cases  from  digitized  films. 

Spiculation  has  been  shown  to  be  a  dominate  feature  and  is  influenced  by  linear-shaped  parenchymal 
structures  that  transverse  the  lesion  on  the  2-D  projection  image.  This  is  one  of  the  reasons 
radiologists  prefer  to  have  a  computer  rating  given  per  view  as  opposed  to  per  case  -  since  the 
projected  view  of  a  lesion  and  a  linear  parenchymal  pattern  could  lead  to  an  erroneous  increase  in  the 
degree  of  spiculation  as  calculated  by  the  computer  method.  It  is  important  that  radiologists 
understand  what  features  the  computer  is  “looking”  at  and  understand  is  the  computer  under-or  over 
calls  a  lesion. 

Incorporation  of  the  computer-vision  methods  with  an  Mammo/Icon  mammographic  review 
system  for  enhanced  diagnosis.  Results  to  date 

Dr.  Swetts  at  Yale  has  left  academics  and  gone  into  private  practice  in  Seattle.  No  grant  funds 
have  been  transferred  to  him.  Instead  researchers  on  the  team  at  the  University  of  Chicago  are  creating 
an  “Mammo/Icon-like”  system.  The  features  (as  well  as  the  merged  values  from  the  artificial  neural 
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network)  from  the  malignant  and  benign  cases  are  tabulated  and  retained  in  the  computer.  We  are 
currently  developing  software  to  first  retrieve  lesions  with  similar  ANN  outputs.  Next  we  will  develop 
the  methods  to  retrieve  lesions  with  similar  individual  features  -  for  example,  spiculated  vs.  non- 
spiculated  or  for  non-spiculated  lesions:  low  density  vs.  high  density  lesions. 

Evaluation  of  the  CAD  methods  for  mammography  Results  to  date 

Databases  are  continuously  being  collected.  For  mass  detection,  we  have  approximately  150 
clinical  cases  of  malignant  masses.  New  data  for  the  classification  database  includes  the  150 
malignant  cases  as  well  as  100  benign  cases.  The  complete  statistical  evaluation  will  be  performed  at  a 
later  date  when  the  databases  are  complete. 

We  have  evaluated  the  mass  classification  method  at  RSNA  98.  The  mass  classification  method 
was  run  on  both  the  MLO  and  CC  views  and  the  magnification  views.  We  showed  that  the  average 
performance  of  128  radiologists  (who  participated  in  the  study)  increased  significantly  from  an  Az  of 
0.89  to  an  Az  of  0.94  (p  <  0.05)  when  the  computer  aid  was  used  in  distinguishing  20  mass  lesions 
cases. 


KEY  RESEARCH  ACCOMPLISHMENTS 

1.  Improvements  in  the  computerized  detection  of  mass  lesions  on  mammograms 

•  Incorporation  of  temporal  image  data 

•  Development  of  new  single  image  detection  method  instead  of  bilateral  subtraction 

2.  Improvements  in  the  computerized  classification  of  mass  lesions  on  mammograms 

•  Investigation  of  a  hybrid  (rule-based  plus  ANN)  system  for  classification 

•  Validation  on  an  independent  database  showing  robustness  of  the  method 

3.  Development  of  a  new  lesion  extraction  (region  growing)  method 

4.  Investigation  into  feature  selection  and  feature  merging  with  limited  datasets 

5.  Continued  experience  with  a  clinical  “prototype”  intelligent  workstation 
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Abstract — Segmenting  lesions  is  a  vital  step  in  many  comput¬ 
erized  mass-detection  schemes  for  digital  (or  digitized)  mam¬ 
mograms.  We  have  developed  two  novel  lesion  segmentation 
techniques — one  based  on  a  single  feature  called  the  radial  gra¬ 
dient  index  (RGI)  and  one  based  on  simple  probabilistic  models 
to  segment  mass  lesions,  or  other  similar  nodular  structures, 
from  surrounding  background.  In  both  methods  a  series  of  image 
partitions  is  created  using  gray-level  information  as  well  as  prior 
knowledge  of  the  shape  of  typical  mass  lesions.  With  the  former 
method  the  partition  that  maximizes  the  RGI  is  selected.  In  the 
latter  method,  probability  distributions  for  gray-levels  inside  and 
outside  the  partitions  are  estimated,  and  subsequently  used  to 
determine  the  probability  that  the  image  occurred  for  each  given 
partition.  The  partition  that  maximizes  this  probability  is  selected 
as  the  final  lesion  partition  (contour).  We  tested  these  methods 
against  a  conventional  region  growing  algorithm  using  a  database 
of  biopsy-proven,  malignant  lesions  and  found  that  the  new  lesion 
segmentation  algorithms  more  closely  match  radiologists’  outlines 
of  these  lesions.  At  an  overlap  threshold  of  0.30,  gray  level 
region  growing  correctly  delineates  62%  of  the  lesions  in  our 
database  while  the  RGI  and  probabilistic  segmentation  algorithms 
correctly  segment  92%  and  96%  of  the  lesions,  respectively. 

Index  Terms — Computer-aided  diagnosis,  digital  mammogra¬ 
phy,  lesion  segmentation,  mass  detection. 


I.  Introduction 

THE  University  of  Chicago  is  currently  developing  com¬ 
puterized  schemes  to  detect  mass  lesions  in  digital  (or 
digitized)  mammograms  [1]— [3].  Many  computerized  schemes 
initially  return  a  number  of  locations  called  “potential  lesion” 
sites.  These  are  regions  that  the  computer  deems  suspicious 
and  require  a  closer  examination.  A  lesion  segmentation  algo¬ 
rithm  is  then  employed  to  extract  the  lesion  or  potential  lesion 
from  its  surrounding  tissues.  Features  can  then  be  calculated 
using  the  segmentation  information  and  classification  can  be 
accomplished  using  these  features  [4]. 

Numerous  techniques  have  been  developed  to  segment 
lesions  from  surrounding  tissues  in  digital  mammograms. 
Petrick  et  al.  [5]  employed  density- weighted  contrast  enhance¬ 
ment  (DWCE)  segmentation  algorithm  to  extract  lesions  and 
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potential  lesions  from  their  surrounding  tissues.  Comer  et  al. 
[6]  and  Li  et  al.  [7]  used  Markov  random  fields  to  classify  the 
different  regions  in  a  mammogram  based  on  texture.  A  lesion 
segmentation  algorithm  was  developed  by  Sameti  et  al.  [8] 
used  fuzzy  sets  to  partition  the  mammographic  image  data. 
Despite  the  difficulty  and  importance  of  this  step  in  many 
computerized  mass-detection  schemes,  few  have  attempted 
to  analyze  the  performance  of  these  segmentation  algorithms 
alone,  choosing  instead  to  collectively  analyze  all  components 
of  a  scheme. 

In  this  paper,  we  present  two  methods  for  segmenting 
lesions  in  digital  or  digitized  mammograms:  a  radial  gradient 
index  (RG/)-based  algorithm  and  a  probabilistic  algorithm. 
These  techniques  are  seeded  segmentation  algorithms;  they 
begin  with  a  point,  called  the  seed  point,  which  is  defined  to 
be  within  the  suspect  lesion.  Many  current  computerized  mass- 
detection  schemes  first  employ  an  initial  detection  algorithm 
which  returns  locations  that  are  used  as  seed  points  for 
the  segmentation  algorithm.  In  our  previous  research  [4],  a 
region  growing  algorithm  [9],  [10]  was  performed  to  extract 
the  lesion  from  its  surrounding  tissues.  Region-growing  is 
a  local  thresholding  process  which  utilizes  only  the  gray- 
level  information  around  the  seed  point.  A  series  of  partitions 
containing  the  seed  point  is  created  by  thresholding,  and 
rules  (relating  to  size  and  circularity,  for  example)  determine 
which  partition  best  segments  the  suspect  lesion.  Potential 
problems  with  such  methods  are  that  the  rules  devised  to 
choose  the  suspect  lesion’s  partition  are  heuristic  and  often 
based  on  the  first  or  second  derivatives  of  noisy  data.  The 
new  methods  discussed  in  this  paper  attempt  to  solve  the 
problems  associated  with  conventional  region  growing  by 
utilizing  shape  constraints  to  regularize  the  partitions  analyzed, 
and  simplifying  the  partition  selection  process  by  using  utility 
functions  based  either  on  a  single  feature  or  probabilities. 
The  performance  of  the  two  methods  is  compared  against 
radiologists’  outlines  on  a  screening  database  of  malignant 
lesions. 

II.  Lesion  Segmentation 

Given  a  subimage  or  region-of-interest  (ROI)  of  dimension 
n  by  rn  containing  the  suspect  lesion,  we  define  the  set  of 
coordinates  in  this  subimage  as 

X  =  {(x\  y)  :  x  =  1, 2.  •  *  • ,  n  and  y  —  1. 2.  •  •  • ,  m}.  (1) 

The  function  describing  the  pixel  gray  levels  of  this  subimage 
is  given  by  f(x,y)  where  (x.y)  €  X.  The  values  of  f(x.y ), 
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Fig.  1.  Partitions  that  can  arise  (b)  when  only  gray-level  information,  f(x,  y)  is  utilized  in  segmenting  lesions  and  (c)  when  the  gray-level  image  is 
multiplied  by  a  constraint  function  to  control  the  shape  of  the  partitions,  h(x*y).  The  original  image  is  shown  in  (a). 


(a)  .  (b)  (c) 

Fig.  2.  The  image  (a)  f{x.y)  of  the  lesion  is  multiplied  by  the  Gaussian  function  (b)  N(x,  y ;  y,x,  fiy,cr^)  to  constrain  the  partitions  to  have  “lesion-like” 
shapes,  which  results  in  (c)  the  function  h(x,y).  The  value  of  <j2c  was  set  to  12.52  mm2  for  these  images. 


for  this  work,  are  bound  between  zero  and  one,  with  a  zero 
representing  black  and  a  one  representing  white.  The  pixel  val¬ 
ues  for  all  images  were  normalized  to  be  within  this  range  by 
dividing  by  the  maximum  pixel  value  possible  for  the  digitizer 
used.  The  task  of  a  lesion  segmentation  algorithm  is  to  partition 
the  set  X  into  two  sets:  C  which  contains  the  coordinates  of 
lesion  pixels,  and  which  contains  surrounding  background 
pixels.  The  lesion  segmentation  algorithms  described  in  this 
paper  are  seeded  segmentation  algorithms;  an  initial  point  is 
used  to  start  the  segmentation.  The  seed  point  (px,py)  is 
defined  to  be  within  the  lesion,  i.e.,  (px,py)  €  C  for  all  £.  In 
addition,  the  perimeter  of  the  set  C  must  be  one  continuous 
closed  contour. 

In  order  to  segment  the  potential  lesion,  the  “validity”  of 
various  image  partitions  £i\i  —  is  evaluated.  For 

conventional  region  growing  segmentation,  the  partitions  are 
typically  defined  as 

4rg)  =  {(*,*0  =  /(*,»)  >  U]  (2) 

where  U  is  a  gray-level  threshold.  This  makes  use  of  the 
fact  that  lesions  tend  to  be  brighter  than  the  surrounding 
tissue  but  it  does  not  directly  take  shape  into  account,  i.e., 
irregular  shapes  can  be  evaluated.  Shape  is,  however,  typically 
indirectly  analyzed  in  these  methods  when  searching  for  the 


partition  to  represent  the  segmented  lesion  [10],  [11].  Fig.  1(b) 
shows  an  example  of  some  of  the  irregular  partitions  that 
can  arise  in  conventional  region  growing.  The  partitions  are 
lesion-shaped  at  high  thresholds,  but  tend  to  effuse  into  the 
background  at  lower  thresholds,  and  are  not  representative  of 
the  lesion. 

Conventional  region  growing  defined  the  lesion  partitions 
£\Tg^  based  solely  on  gray-level  information  in  the  image.  The 
new  algorithms  proposed  in  this  paper  add  additional  a  priori 
information  into  the  creation  of  the  lesion  partitions.  Lesions 
tend  to  be  compact,  meaning  that  their  shapes  are  typically 
convex.  To  incorporate  this  knowledge  into  the  creation  of 
the  partitions,  the  original  image  is  multiplied  by  a  function, 
called  the  constraint  function,  that  suppresses  distant  pixel 
values.  For  this  study  we  chose  to  use  an  isotropic  Gaussian 
function  centered  on  the  seed  point  location  {px^py)  with 
a  fixed  variance  o2  as  the  constraint  function.  The  function 
h(x ,  y)  resulting  from  the  multiplication  of  the  original  ROI 
with  the  constraint  function  is  given  by 

h(x,y)  =  f(x,y)N(x,y;nx,(j,y,(r2)  (3) 

where  N(x,  y;  px.  py,  a2)  is  a  circular  normal  distribution 
[see  Fig.  2(b)]  centered  at  (px,py)  with  a  variance  cr2.  Other 
constraint  functions  may  be  more  appropriate  for  different 
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Fig.  3.  Features  employed  in  determining  the  final  partition  for  conventional 
region  growing.  Here,  /  corresponds  to  the  different  gray-level  intervals. 

segmentation  tasks.  We  have  found,  however,  that  a  Gaussian 
works  well  for  mammographic  lesions.  Fig.  2(c)  shows  an 
example  of  the  function  h(x.y).  At  a  given  threshold,  the 
partitions  returned  by  thresholding  are  more  compact  than 
before  because  distant  pixels  are  suppressed,  i.e.,  a  geometric 
constraint  has  been  applied.  The  new  partitions  are  defined  as 

J0i  =  {{x,y):h(x,y)>ti}.  (4) 

An  example  is  shown  in  Fig.  1(c).  Note  that  all  of  the  partitions 
are  now  “lesion-like;”  they  are  influenced  by  both  the  gray- 
level  information  and  the  geometric  constraint.  The  value  of 
the  parameter  will  be  discussed  later. 

A.  Region-Growing  Segmentation 

In  conventional  region  growing,  a  feature  or  multiple  fea¬ 
tures  may  be  calculated  for  the  partitions  described  in  (2).  For 
example,  circularity  Circ( )  and  size  Size( )  can  be  calculated 
for  every  £z-rg^  as  demonstrated  in  Fig.  3.  The  final  partition  is 
chosen  by  analyzing  these  functions  and  determining  transition 
points  or  jumps  in  the  features  [4],  [10],  [11].  As  Fig.  3  shows, 
the  data  can  exhibit  multiple  transition  points,  and  determining 
a  jump  by  analyzing  the  first  derivative  of  noisy  data  is 
difficult.  If  a  transition  point  cannot  be  found,  the  segmentation 
algorithm  fails  to  return  a  final  partition. 


Fig.  4.  The  geometry  used  in  calculating  the  RGI.  The  squares  represent 
margin  pixels  Mi  of  the  partition  being  evaluated. 


Fig.  5.  The  RGI  as  a  function  of  the  different  partition  Ct  for  the  image 
shown  in  Fig.  1(a).  The  partition  with  the  largest  RGI  value  is  returned  as  the 
final  lesion  partition.  Here,  i  corresponds  to  the  different  gray-level  intervals. 

The  RGI  is  computed  as  follows.  Given  a  partition  Li  (4) 
we  can  define  the  margin  as 

Mi  ={(x,  y) :  (x,y)  G  £j  and  either  (x  -  l,y), 

(x  +  1,  y),  (x,  y  +  1),  or  (x,  y  -  1)  £  Ci).  (5) 


B.  Radial  Gradient  Segmentation 

Given  a  series  of  partitions  Li  from  (4),  one  must  determine 
which  of  these  partitions  best  delineates  the  lesion.  One 
method  is  to  apply  a  utility  function.  Bick  etal.  [12]  employed 
a  RGI  utility  function  in  his  lesion  segmentation  algorithm  that 
utilized  Fourier  descriptors  to  describe  the  shapes  of  lesions. 
We  have  employed  the  RGI  measure  on  the  image  f(x,y) 
around  the  margin  of  each  partition  Li  as  a  utility  function. 
For  every  partition  Li  the  RGI  is  calculated  (see  Fig.  5),  and 
the  partition  with  the  maximum  RGI  is  returned  as  the  final 
lesion  partition.  It  is  important  to  note  that  the  partitions  L{ 
are  generated  using  the  processed  image  h(x,  y)  while  the  RGI 
measure  is  computed  on  the  original  image  f(x,  y). 


This  states  that  a  point  is  on  the  margin  if  it  has  at  least  one 
neighbor  that  is  not  in  the  lesion.  The  RGI  is  given  by 


RGI  — 


E  ii^)n 


-1 


E  £(*>») 

(x,y)€Mi 


r(s,  y) 

l|r(x,y)|| 


(6) 


where  G(x,y)  is  the  gradient  vector  of  f(x,y)  at  position 
(x,  y)  and  (f  (x,  y))/(\\r(x,  y)||)  is  the  normalized  radial  vector 
at  the  position  (x,  y)  (Fig.  4).  The  RGI  is  a  measure  of  the 
average  proportion  of  the  gradients  directed  radially  outward. 
An  RGI  of  one  signifies  that  all  the  gradients  around  the  margin 
are  pointing  directly  outward  along  the  radius  vector  and  a  - 1 
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Fig.  6.  A  plot  of  the  probability  that  the  image  occurred  at  different  Ct 
for  the  image  shown  in  Fig.  1(a).  The  maximum-likelihood  estimate  of  the 
partition  is  given  by  the  partition  which  maximizes  this  function.  Here,  i 
corresponds  to  the  different  gray-level  intervals. 


an  optimization  problem  choosing  instead  to  evaluate  all  Li 
and  determine  the  maximum. 

The  probability  distribution  for  the  gray  levels  when  the 
pixels  are  outside  the  set  Li  is  given  by  the  function  z(f(x ,  y )) 
[see  (7)],  which  is  estimated  from  all  gray  levels  within 
the  ROI.  Kernel  density  estimation  using  an  Epanechnikov 
kernel  was  employed  to  estimate  this  distribution  [13].  The 
width  of  the  kernel  was  optimally  determined  through  cross 
validation  [13].  Kernel  density  estimation  is  a  method  similar 
to  histogram  analysis  except  that  a  nonrectangular  kernel  is 
used  to  bin  data  and  this  kernel  is  swept  across  the  function 
axis  continuously.  Histogram  analysis,  on  the  other  hand,  uses 
a  box-function  that  is  moved  in  increments  of  the  box  width. 
Figs.  9  and  10  show  the  calculated  probability  distributions 
for  gray  levels  inside  and  outside  Li  for  the  ROI’s  shown  in 
Figs.  7  and  8,  respectively. 

III.  Results 


indicates  that  all  the  gradients  around  the  margin  are  pointing 
directly  inward  toward  the  center  of  the  partition.  The  RGI 
value  around  the  margin  of  a  circular  lesion,  for  example,  is 
one.  If,  however,  f(x,y)  is  a  uniform  image,  then  the  RGI 
value  will  be  zero  even  if  the  margin  Mi  is  a  circle. 

C.  Probabilistic  Segmentation 

The  segmentation  method  based  on  probabilistic  models  is 
somewhat  similar  to  the  RGI  method,  except  that  the  utility 
function  is  now  a  probability.  The  probability  of  pixel  gray 
levels  given  a  partition  Li  (4)  is  modeled  as 

p(f(x,y)  |  Luaf) 

_  tN(f(x,  y);  /(Mx,  My),  of) :  {x,  y)  €  A 

!*(/(*,»))  :(*,y)g£i  () 

where  N(f(x,y);f(px,p,y),af)  is  a  normal  distribution  cen¬ 
tered  at  the  seed  point  gray  level  f{px,p.y),  with  a  variance  of, 
and  z(f(x,y))  is  a  function  to  be  described  later.  Lesions  will 
not  exhibit  a  large  variation  in  pixel  values,  while  the  tissues 
surrounding  the  lesion  may  show  large  variation  because  they 
may  consist  of  both  fatty  and  dense  regions.  The  uniformity  of 
lesions  is  accounted  for  by  a  small-variance  Gaussian  function 
centered  around  the  seed  pixel  value.  The  term  z(f(x,  V )) 
is  a  function  that  is  estimated  for  each  ROI  using  the  gray 
levels  from  all  of  the  pixels  within  the  ROI  although  it  is  only 
employed  in  calculating  p(f(x,y)  \  Li,  of)  for  (x,y)  0  A 
[see  (7)].  Finally,  the  probability  of  the  image  (or  ROI)  I 
given  a  partition  Li  is 

p(l\Ci,af)=  JJ  p(f(x,y)  |  A, of).  (8) 

(x,y)€X 

The  partition  A  that  is  chosen  is  the  one  that  maximizes  the 
probability  p(l  |  A,  A2)-  ie., 

P ( 1 1  Amai ,  erf)  =  argmax  {p(l  |  A, erf)}.  (9) 

i 

An  example  plot  of  p(I  \  Li,  of)  is  shown  in  Fig.  6.  Because 
there  are  a  finite  number  of  Li ,  we  avoid  the  complexity  of 


A.  Parameter  Estimation 

The  width  of  of  the  constraint  function  in  (3)  was  deter¬ 
mined  based  on  knowledge  of  lesions  and  was  not  statistically 
determined.  A  value  of  12.52  mm2  was  empirically  determined 
to  work  well  for  our  purposes.  Larger  lesions  were  also  seg¬ 
mented  with  this  value  but  spiculations  and  small  deviations 
around  the  edge  of  the  lesion  were  usually  not  delineated. 

The  parameter  of  in  (7)  is  an  unknown  quantity  and  must  be 
determined.  The  average  variation  of  the  gray  levels  within  the 
radiologist’s  outlined  truth  for  a  screening,  malignant  database 
of  1 18  visible  lesions  was  estimated.  Fig.  1 1  shows  the  density 
distribution  for  these  variations  as  measured  by  the  standard 
deviation  of  the  gray  levels  within  the  radiologist’s  outlines. 
A  value  of  0.038  was  determined  to  be  the  most  common 
standard  deviation  of  pixel  values  within  the  radiologist’s 
outlines.  It  is  important  to  note  that  problems  may  arise  when 
the  radiographic  presentation  of  lesions  in  other  databases  are 
substantially  different  from  those  in  the  database  employed  in 
this  study.  We,  however,  employed  a  database  of  60  malignant, 
nonpalpable  lesions  obtained  from  roughly  700  needle  biopsies 
performed  during  the  years  1987  to  1993  and,  thus,  should  be 
representative  of  the  actual  distribution. 

The  value  of  of  can  also  be  determined  for  each  lesion 
individually.  Instead  of  just  using  the  most  probable  a  priori 
value  of  of  (as  discussed  above)  one  can  apply  Bayes’  theorem 
to  find  that 


p(af\l,Ci) 


p(/  1  Lj,of)p(of  1  A) 
p(I  I  A) 


(10) 


where  p(I  \  Li,  of)  is  given  by  (8).  If  we  assume  that  erf  and 
A  are  independent,  then  p(crf  |  A)  =  p{crf).  The  distribution 
of  p(crf)  can  be  obtained  from  Fig.  11.  Finally,  we  know  that 
p(l  |  A)  =  /  da i  p(I  |  £i,af)p(af)  which  results  in 


p(af  1 1,  Li)  = 


p(/  1  A,of2)p(of) 

Jdaip(I  |  Li,af)p(af) 


(ID 
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(a) 


(C)  (d) 

Fig.  7.  Segmentation  results  for  (a)  a  high-contrast  lesion  using  (b)  region  growing,  (c)  RGI- based  segmentation,  and  (d)  probabilistic  segmentation. 


The  probability  of  various  values  of  of  could  be  compared 
against  each  other  and  the  optimal  of  estimated.  Unfortu¬ 
nately,  to  estimate  p(of  |  I,  A)  one  must  compute 

J daip(l  |  Ci,of)p(<rf)  (12) 

which  involves  integrating  over  all  possible  values  of  oi  and 
is  very  time  consuming.  Not  only  do  we  have  the  problem 
of  integrating  over  all  07  values  but  the  value  computed  is 
the  probability  given  a  partition  A-  This  leaves  us  with  a  dual 
optimization  task.  For  a  given  of  the  optimal  partition  £finai  is 
determined.  This  partition  is  then  employed  to  determine  a  new 
optimal  of.  This  process  continues  until  there  is  convergence. 
For  this  research,  we  instead  employed  a  constant  value,  i.e., 
the  most  probable  a  priori  value  of  of. 

B.  Segmentation  Performance 

Segmentation  results  for  a  relatively  simple  (high  contrast) 
lesion  are  shown  in  Fig.  7.  All  three  methods,  region  grow¬ 
ing,  RGI- based  segmentation,  and  probabilistic  segmentation, 
perform  well  on  this  lesion.  Region  growing  has  somewhat 
undergrown  the  lesion  and  has  a  long  tail.  The  RGI- based 
method  and  the  probabilistic  method  segment  the  lesion  better 
than  region  growing.  Similar  images  are  shown  for  a  more 
difficult  lesion  on  a  border  between  a  fatty  region  and  the 
pectoralis  muscle  (Fig.  8).  Because  of  the  brightness  of  the 


pectoralis  muscle,  region  growing  effuses  into  the  background 
too  soon  and  thus,  the  transition  point  found  results  in  a 
grossly  undergrown  lesion.  There  are  also  vessels  that  can  be 
radiographically  seen  passing  through  the  center  of  this  lesion. 
The  RGI- based  segmentation  algorithm  chooses  the  boundary 
of  a  vessel  as  the  best  partition  because  the  RGI  value  around 
the  vessel  is  larger  than  that  around  the  actual  lesion.  The 
probabilistic  segmentation  algorithm,  however,  does  not  get 
confused  by  the  vessel  inside  the  lesion  and  correctly  segments 
this  difficult  lesion. 

In  order  to  quantify  the  performance  differences  between 
the  three  different  segmentation  methods,  the  segmentation 
results  were  compared  against  radiologists’  outlines  of  the 
lesions.  The  screening  database  of  nonpalpable,  biopsy-proven, 
malignant  cancers  with  a  total  of  118  visible  lesion  ROI’s  was 
employed.  For  each  lesion  the  seed  point  was  calculated  from 
the  center  of  mass  of  the  radiologist’s  outline.  Once  the  lesion 
was  segmented,  an  overlap  measure  O  was  calculated  using 
the  set  returned  from  the  segmentation  algorithm  £  and  the 
radiologist’s  hand-drawn  segmentation  set  T.  The  overlap  O 
is  defined  as  the  intersection  over  the  union,  i.e. 


Area  (£  fl  T) 
Area  (£  U  T) ' 


(13) 


The  value  of  O  is  bound  between  zero  (no  overlap)  and  one 
(exact  overlap).  A  threshold  needs  to  be  set  in  order  to  classify 
a  result  as  an  “adequate”  segmentation,  i.e.,  if  O  is  greater  than 
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Fig.  8.  (a)  Segmentation  results  for  a  lesion  on  the  boundary  between  a  fatty  area  and  the  pectoralis  muscle  using  (b)  region  growing  segmentation, 

(c)  RGI- based  segmentation,  and  (d)  probabilistic  segmentation. 


Fig.  9.  Probability  distributions  employed  when  a  pixel  is  inside  or  outside 
of  the  set  in  question  for  the  image  shown  in  Fig.  7.  The  distribution  employed 
when  (.r.  y)  £  C,  Gaussian  centered  at  the  seed  point  gray  level  with  a 
variance  of  erf.  The  distribution  c(/(.r.  y))  is  employed  when  (.r.  y)  0  C ; 
and  is  estimated  from  all  gray  values  within  the  ROI. 

a  certain  value  then  the  lesion  is  considered  to  be  correctly 
segmented. 

Fig.  12  shows  a  plot  of  the  fraction  of  lesions  correctly  seg¬ 
mented  at  various  overlap  threshold  levels.  The  probabilistic 
segmentation  algorithm  outperformed  the  other  methods.  Also 
shown  in  Fig.  12  is  the  performance  of  a  different  radiologist 


Fig.  10.  Probability  distributions  employed  when  a  pixel  is  inside  or  outside 
of  the  set  in  question  for  the  ROI  shown  in  Fig.  8.  The  distribution  employed 
when  (.r.  y)  €  C,  is  a  Gaussian  centered  at  the  seed  point  gray  level  with  a 
variance  of  erf .  The  distribution  -(/(.»*.  y))  is  employed  when  (r.  y)  #  C , 
and  is  estimated  from  all  gray  values  within  the  ROI. 

in  extracting  the  lesions  as  compared  with  the  first  radiologist. 
It  is  interesting  to  note  that  the  performances  of  the  RGI- 
based  and  probabilistic  methods  are  not  too  dissimilar  from  the 
human  performance.  Region  growing  never  yielded  all  lesions 
correctly  segmented  even  when  the  overlap  threshold  was  zero 
because  the  method  failed  to  find  a  transition  point  in  many 
of  the  images. 
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Fig.  11.  The  distribution  of  standard  deviations  of  the  gray  levels  within 
the  radiologist’s  outlined  lesions  for  a  database  of  60  malignant  lesions  (118 
ROI’s).  The  pixel  values  of  the  images  were  normalized  to  be  between  zero 
and  one. 


Fig.  12.  The  performance  of  the  different  segmentation  methods  on  a 
database  of  malignant  lesions  as  compared  with  a  radiologist’s  outlines.  Also 
shown  is  the  agreement  of  another  radiologist’s  outlines  of  the  lesions  in  the 
databases  with  the  outlines  of  the  first  radiologist. 


IV.  Discussion 

Bayesian  analysis  could  be  applied  to  the  probabilistic 
segmentation  algorithm  resulting  in 


I,  of)  = 


p(7|A,a?)p(A) 


By  analyzing  (14)  one  finds  that  the  p(Ci)  is  a  term  that 
penalizes  partitions  which  are  not  “lesion”  shaped.  The  par¬ 
titions  in  our  study,  however,  are  obtained  after  the  shape 
constraint  function  (4)  has  been  applied  so  every  partition 
analyzed  is  “lesion”  shaped  and  thus,  a  Bayesian  analysis  is 
not  necessary.  If  deformable  contours  are  employed  instead 
of  a  series  of  lesion-shaped  partitions,  then  Bayes’  rule  (14) 
should  be  applied. 

The  assumption  throughout  this  paper  has  been  that  appro¬ 
priate  partitions  can  be  generated  by  gray-level  thresholding 
the  function  h(x,y)  (3).  This  assumption,  as  is  shown  by 
the  results  of  this  paper,  is  generally  appropriate  for  most 


lesions.  There  are,  however,  cases  where  thresholding  h(x,  y) 
does  not  generate  adequate  partitions  for  a  given  lesion.  In 
some  cases,  oddly  shaped  lesions  may  be  surrounded  by 
glandular  structures  which  may  confuse  the  algorithm  into 
calling  those  normal  structures  part  of  the  lesion.  Spiculations, 
which  are  common  in  malignant  lesions,  are,  in  general,  not 
included  in  the  final  lesion  partition  because  of  the  application 
of  the  constraint  function.  The  purpose  of  the  segmentation 
algorithms  described  in  this  paper,  however,  is  to  determine 
the  general  shape  of  the  lesions  and  not  necessarily  the  detailed 
shape  in  which  all  spiculations  are  demarcated. 

There  is  an  implicit  model  that  arises  from  the  density 
functions  employed  in  the  probabilistic  segmentation  algo¬ 
rithm.  Equation  (7)  assumes  that  all  pixels  within  the  lesion 
come  from  a  Gaussian  distribution  centered  at  the  seed  point 
pixel  value.  The  lesion  model  from  which  this  distribution 
arises  is  a  very  simple  one:  a  lesion  has  uniform  gray  levels 
with  fluctuations  arising  from  both  noise  and  structure.  In  the 
future,  more  complex  models,  such  as  modeling  a  lesion  as  a 
projection  of  a  sphere,  can  be  implemented.  The  distributions, 
however,  become  more  difficult  with  which  to  work  and  the 
assumption  of  independence  in  (8)  and  (10)  is  no  longer  valid. 

Different  initial  seed  points  will  result  in  different  seg¬ 
mentation  results.  For  both  the  /?G/-based  and  probabilistic 
segmentation  algorithms,  the  results  are  very  similar  given 
small  changes  in  the  seed  point  location.  If,  however,  the 
seed  point  is  selected  to  be  at  the  very  edge  of  the  lesion, 
then  the  final  partitions  returned  by  both  the  RGI- based  and 
probabilistic  algorithms  will  be  poor. 

We  comparatively  evaluated  the  three  segmentation  meth¬ 
ods  at  various  overlap  criteria  (Fig.  12)  because  different 
investigators  may  use  different  evaluation  criteria  as  well 
as  different  databases.  Previously,  we  have  shown  that  the 
reported  performance  of  a  computer  detection  method  can 
greatly  vary  depending  on  the  criteria  used  in  tabulating 
sensitivity  and  specificity  [14]. 

The  performance  differences  between  the  probabilistic  al¬ 
gorithm  and  the  RGI- based  method  are  small.  Both,  however, 
substantially  outperform  conventional  region  growing.  It  is 
expected  that  this  better  segmentation  performance  will,  in  the 
future,  result  in  more  meaningful  features  being  extracted  from 
potential  lesion  regions,  and,  ultimately,  in  better  classification 
of  malignant  lesions  from  normal  tissue  regions. 


V.  Conclusion 

We  have  developed  two  new  methods  of  seeded  lesion 
segmentation  for  use  in  digital  mammography.  These  new 
methods  substantially  outperform  conventional  region  growing 
segmentation.  At  an  overlap  threshold  of  0.3,  region  growing 
correctly  identified  62%  of  the  lesions  in  our  database,  while 
the  RGI- based  and  probabilistic  segmentation  methods  cor¬ 
rectly  segmented  92%  and  96%  of  the  lesions,  respectively. 
With  these  new  segmentation  results  we  hope  to  find  and 
extract  new  features  that  will  help  differentiate  between  ac¬ 
tual  lesions  and  false  detections,  thus  improving  the  overall 
performance  of  computerized  mass  detection. 
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Automated  Computerized  Classification  of 
Malignant  and  Benign  Masses  on  Digitized 

Mammograms' 

Zhimin  Huo,  MSa  Maryellen  L.  Giger,  PhD,  Carl  J.  Vyborny,  MD,  PhD 
Dulcy  E.  Wolverton,  MD,  Robert  A.  Schmidt  MD,  Kunio  Dot  PhD 


Rationale  and  Objectives.  To  develop  a  method  for  dif¬ 
ferentiating  malignant  from  benign  masses  in  which  a 
computer  automatically  extracts  lesion  features  and 
merges  them  into  an  estimated  likelihood  of  malignancy. 

Materials  and  Methods.  Ninety-five  mammograms  de¬ 
picting  masses  in  65  patients  were  digitized.  Various  fea¬ 
tures  related  to  the  margin  and  density  of  each  mass  were 
extracted  automatically  from  the  neighborhoods  of  the 
computer-identified  mass  regions.  Selected  features  were 
merged  into  an  estimated  likelihood  of  malignancy  by 
using  three  different  automated  classifiers.  The  perfor¬ 
mance  of  the  three  classifiers  in  distinguishing  between 
benign  and  malignant  masses  was  evaluated  by  receiver 
operating  characteristic  analysis  and  compared  with  the 
performance  of  an  experienced  mammographer  and  that 
of  five  less  experienced  mammographers. 

Results.  Our  computer  classification  scheme  yielded  an 
area  under  the  receiver  operating  characteristic  curve 
(Az)  value  of  0.94,  which  was  similar  to  that  for  an  expe¬ 
rienced  mammographer  (Az  =  0.91)  and  was  statistically 
significantly  higher  than  the  average  performance  of  the 
radiologists  with  less  mammographic  experience  (Az  = 
0.81)  ( P  =  .013).  With  the  database  used,  the  computer 
scheme  achieved,  at  100%  sensitivity,  a  positive  predic¬ 
tive  value  of  83%,  which  was  12%  higher  than  that  for 
the  performance  of  the  experienced  mammographer  and 
21%  higher  than  that  for  the  average  performance  of  the 
less  experienced  mammographers  (P  <  .0001). 

Conclusion.  Automated  computerized  classification 
schemes  may  be  useful  in  helping  radiologists  distin¬ 
guish  between  benign  and  malignant  masses  and  thus  re¬ 
ducing  the  number  of  unnecessary  biopsies. 

Key  Words.  Breast,  biopsy;  breast  neoplasms,  diagno¬ 
sis;  computers,  diagnostic  aid;  computers,  neural  net¬ 
work. 


The  present  widespread  use  of  mammography  for  early 
detection  of  breast  cancer  in  asymptomatic  women  in¬ 
creases  the  importance  of  radiologists  recognizing  the 
mammographic  features  that  distinguish  carcinomas  from 
benign  abnormalities.  Despite  improvements  in  the  crite¬ 
ria  used  to  differentiate  benign  from  malignant  lesions  of 
the  breast  (1-6),  considerable  misclassification  of  lesions 
occurs  in  everyday  clinical  practice.  At  many  centers, 
only  15%-30%  of  mammographically  detected  lesions 
analyzed  by  means  of  surgical  breast  biopsy  are  actually 
malignant  (7,8).  There  also  is  great  variation  (7%-40%) 
in  positive  biopsy  rates  among  individual  radiologists  (9). 

Computer-aided  diagnosis  in  mammography  can  be 
defined  as  a  diagnosis  made  by  a  radiologist  who  takes 
into  account  the  output  from  a  computer  analysis  of  a 
mammogram.  Many  investigators  have  studied  the  use  of 
computer  analysis  as  an  aid  in  the  early  detection  of 
breast  cancer  (10-13).  The  development  of  computer  aids 
that  help  in  the  classification  portion  of  a  mammographic 
work-up  also  has  been  studied.  An  objective  computer 
classification  scheme  capable  of  differentiating  between 
benign  and  malignant  masses  at  a  level  similar  to  that  of 
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Figure  1 .  Characteristics  of  the  95 
mammographic  mass  images  (65 
patients)  in  the  database  in  terms  of 
(a)  spiculation,  (b)  shape,  (c)  lobu¬ 
lation,  and  (d)  density  as  rated  by 
an  experienced  mammographer 
(C.J.V.  or  D.E.W.)  and  (e)  size  (effec¬ 
tive  diameter)  calculated  from  the 
mass  outlines,  which  were  hand 
drawn  by  an  experienced 
mammographer  (D.E.W.). 


experienced  mammographers  would  help  radiologists  im¬ 
prove  accuracy,  decrease  variability,  and  reduce  the  num¬ 
ber  of  unnecessary  biopsies. 

In  the  classification  of  lesions,  investigators  have 
taken  advantage  of  the  ability  of  radiologists  to  extract 
features  related  to  the  margin  and  density  of  mammo¬ 
graphic  abnormalities  and  have  used  computers  to  merge 
these  (human-extracted)  features  into  diagnoses  (14-18). 
Use  of  computer-based  decision  systems  such  as  rule- 
based  methods,  discriminant  analysis,  and  artificial  neu¬ 
ral  networks  (ANNs)  to  merge  the  information  extracted 
by  either  human  observers  or  computers  has  been  investi¬ 
gated  (14-19).  In  addition,  computerized  techniques  can 
be  used  to  automatically  extract  individual  image  features 
such  as  spiculation  (20-22),  margin  sharpness  (23),  ir¬ 
regularity  (24),  and  density  (25).  Some  investigators  have 
attempted  to  use  multiple  computer-extracted  features  to 
classify  masses  (24,26). 

In  this  study,  we  address  the  classification  task  in  mam¬ 
mographic  work-up  and  introduce  a  set  of  morphologic 
features  similar  to  the  ones  used  by  practicing  radiologists 
to  characterize  margin  and  density  of  a  mass.  We  then 
merge  these  features  with  a  spiculation  measure  into  an  es¬ 
timated  likelihood  of  malignancy  for  individual  lesions.  It 
should  be  noted  that  our  fully  automated  computerized 
method  includes  automated  lesion  segmentation,  auto¬ 
mated  feature  extraction,  and  automated  classification.  The 


effectiveness  of  each  individual  feature  and  the  role  of 
each  feature  in  classification  of  masses  were  studied. 

To  process  the  computer-extracted  features  more  ef¬ 
fectively,  a  two-step  rule-based  method  and  an  ANN  were 
used  to  merge  these  features.  To  overcome  the  limitations 
of  these  two  individual  types  of  classifiers  for  this  par¬ 
ticular  task,  integration  of  a  rule-based  method  and  an 
ANN  was  introduced  as  a  hybrid  information-processing 
approach.  The  hybrid  system  provides  more  power  as  a 
computer-based  classifier  by  allowing  emulation  of  hu¬ 
mans  in  their  information-processing  and  decision-mak¬ 
ing  capabilities.  The  ability  of  the  three  classifiers  to 
merge  the  computer-extracted  features  into  a  correct  di¬ 
agnosis  was  evaluated  in  65  patients  by  using  receiver  op¬ 
erating  characteristic  (ROC)  analysis  (27,28).  The  perfor¬ 
mance  of  the  computer  was  compared  with  that  of  an  ex¬ 
perienced  mammographer  and  five  radiologists  with  less 
mammographic  experience. 


MATERIALS  AND  METHODS 


The  database  used  in  this  study  consisted  of  95  clinical 
mammographic  images  (Min-R  screen/OM-1  film;  East¬ 
man  Kodak,  Rochester,  NY),  each  of  which  contained  a 
mass.  Thirty-eight  of  the  images  showed  benign  lesions, 
and  57  showed  malignant  lesions.  The  95  mammograms 
were  collected  from  examinations  of  65  patients  and  repre- 
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sented  an  entire  database  gathered  in  our  laboratory  from 
December  1985  to  October  1989.  Twenty-six  of  the  65  pa¬ 
tients  had  benign  breast  abnormalities,  and  39  patients  had 
breast  cancer.  Both  mediolateral  oblique  and  craniocaudal 
views  were  available  for  30  of  the  patients  (12  of  26  pa¬ 
tients  with  benign  lesions  and  18  of  39  patients  with  malig¬ 
nant  lesions).  According  to  the  original  selection  criteria, 
patients  were  chosen  who  had  masses  that  were  difficult  to 
classify  and  who  had  undergone  open  biopsy  or  long-term 
mammographic  follow-up.  All  but  one  patient  underwent 
biopsy  for  the  suspicion  of  breast  cancer,  and  in  the  re¬ 
maining  one  patient  the  disease  was  deemed  benign  on  the 
basis  of  follow-up  of  more  than  5  years.  The  screen-film 
mammograms  were  digitized  with  an  optical  drum  scanner 
(FIP  II;  Fuji  Film,  Tokyo,  Japan)  at  a  sampling  distance  of 
0.1  mm  and  10-bit  quantization. 

To  characterize  the  database,  two  experienced  mam- 
mographers  (C.J.V.,  D.E.W.)  rated  each  mass  with  re¬ 
spect  to  spiculation,  lobulation,  shape,  and  density  by  us¬ 
ing  a  five-point  scale  in  which  1  corresponded  to  not 
spiculated,  not  lobulated,  circular,  or  fat  containing  and  5 
corresponded  to  definitely  spiculated,  lobulated,  ovoid,  or 
very  dense,  respectively.  These  distributions  are  shown  in 
Figure  la- Id.  The  size  of  each  mass  in  terms  of  effective 
diameter  was  also  estimated  based  on  the  region  outlined 
on  the  computer  by  an  experienced  mammographer 
(D.E.W.).  The  effective  diameter  of  a  mass  is  defined  as 
the  diameter  of  the  equivalent  circle  (whose  area  is  the 
same  as  the  area  of  the  grown  region)  of  the  identified 
mass  region  (29).  The  distribution  of  size  in  terms  of  ef¬ 
fective  diameter  for  the  masses  depicted  on  the  95  images 
is  shown  in  Figure  le;  the  average  size  was  approxi¬ 
mately  1.3  cm. 

Our  current  classification  scheme  consists  of  three 
stages:  {a)  automated  segmentation  of  mammographic 
masses  from  surrounding  parenchyma,  ( b )  automated  fea¬ 
ture  extraction,  and  (c)  automated  classification,  which 
yields  an  estimation  of  malignancy  of  a  mass  by  means  of 
one  of  three  classifiers — a  rule-based  method,  an  ANN, 
or  a  hybrid  system  (ie,  a  combination  of  a  one-step  rule- 
based  method  and  an  ANN). 

The  area  under  the  ROC  curve  (Az)  was  used  to  evalu¬ 
ate  the  ability  of  our  computer  classification  scheme  to 
utilize  the  three  different  classifiers  to  differentiate  be¬ 
nign  from  malignant  masses.  Clinically,  the  specificities 
at  high  sensitivity  levels  are  most  relevant  because  the 
“cost”  of  missing  a  cancer  is  greater  than  the  cost  of  per¬ 
forming  a  biopsy  to  assess  a  benign  lesion.  Thus,  the  av¬ 
erage  performances  in  a  high  sensitivity  range  (true-posi¬ 


tive  fraction  [TPFJ  above  0.90)  were  evaluated  for  both 
our  classification  schemes  and  the  observers  by  using  a 
partial  area  index,  TPFAz',  from  0  to  1,  which  is  the  portion 
of  the  Az  that  lies  above  a  preselected  sensitivity  thresh¬ 
old  (TPF0)  in  a  conventional  ROC  graph  divided  by  the 
constant  (1  -  TPF0)  (30).  These  performances,  in  terms  of 
specificity  at  a  given  sensitivity  level,  were  also  evalu¬ 
ated.  In  this  study,  we  chose  to  calculate  specificity  at  a 
sensitivity  level  of  100%  because  the  aim  of  creating  the 
computer  output  in  our  research  was  to  aid  radiologists  in 
reducing  the  number  of  unnecessary  biopsies  performed 
without  misclassifying  any  cancers. 

Segmentation 

Segmentation  of  a  mass  from  the  background  paren¬ 
chyma  was  accomplished  by  using  a  multiple-transition- 
point,  gray-level,  region-growing  technique  (22).  Seg¬ 
mentation  begins  within  a  512  x  512-pixel  region  of 
interest  manually  centered  about  the  abnormality  in  ques¬ 
tion,  as  illustrated  in  Figure  2a  and  2b.  In  clinical  prac¬ 
tice,  the  location  of  the  mass  could  be  identified  either  by 
a  radiologist  or  with  a  computer-detection  scheme  (31) 
and  then  fed  into  the  classification  scheme  for  an  output 
in  regard  to  the  likelihood  of  malignancy.  To  correct 
for  the  nonuniformity  of  the  background  distribution  and 
to  enhance  image  contrast  for  better  segmentation  of 
masses,  background  trend-correction  and  histogram- 
equalization  techniques  were  applied  to  the  512  x  512- 
pixel  region  of  interest  (22).  The  corresponding  enhanced 
images  of  the  malignant  and  benign  masses  are  shown  in 
Figure  2c  and  2d,  respectively.  The  computer-identified 
margins  of  the  malignant  and  benign  masses  are  superim¬ 
posed  on  the  images  of  the  original  masses  in  Figure  2e 
and  2f.  For  comparison,  margins  of  the  same  images  hand 
drawn  by  an  experienced  mammographer  are  shown  in 
Figure  2g  and  2h. 

Computer-extracted  Radiographic  Features: 
Margin  and  Density 

The  margin,  shape,  and  density  of  a  mass  are  three 
major  characteristics  used  by  radiologists  in  classifying 
masses.  Different  characteristics  of  these  features  are  as¬ 
sociated  with  different  levels  of  probability  of  malig¬ 
nancy  (4,6,32).  To  determine  the  likelihood  of  malig¬ 
nancy  associated  with  different  margin  and  density  char¬ 
acteristics,  we  developed  algorithms  that  extract  two 
features  that  characterize  the  margin  of  a  mass  (spicula¬ 
tion,  sharpness)  and  three  features  that  characterize  the 
density  of  a  mass  (average  gray  level,  contrast,  texture). 
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Figure  2.  Mammographic  images  of  (a)  a 
malignant  mass  and  (b)  a  benign  mass  in  a 
512  x  51 2-pixel  region  of  interest;  enhanced 
images  of  the  (c)  malignant  and  (d)  benign 
masses  after  image  processing;  and  com¬ 
puter-extracted  margins  superimposed  on 
the  (e)  malignant  and  (f)  benign  masses 
(Fig  2  continues). 


We  did  not  explicitly  devise  a  specific  measure  to  charac¬ 
terize  the  shape  of  a  mass  for  the  purpose  of  classifica¬ 
tion,  but  measures  related  to  shape  are  embedded  within 
the  other  measures. 

Margin. — Margin  characteristics  are  very  important  in 
differentiating  between  benign  and  malignant  masses.  To 
determine  the  likelihood  of  malignancy  of  a  mass  based 
on  its  margin,  two  major  margin  characteristics — spicula- 


tion  and  sharpness — were  measured.  Margin  spiculation 
is  the  most  important  indicator  for  malignancy,  with 
spiculated  lesions  having  a  greater  than  90%  probability 
of  malignancy  (6).  Margin  sharpness  is  also  very  impor¬ 
tant  in  determining  whether  a  mass  is  benign  or  malig¬ 
nant;  an  ill-defined  margin  indicates  possible  malignancy, 
and  a  well-defined  margin  indicates  likely  benignity. 

Only  about  2%  of  well-defined  masses  are  malignant  (2). 


158 


Vol  5,  No  3,  March  1998 


AUTOMATED  COMPUTERIZED  CLASSIFICATON  OF  MASSES 


c.  d. 


Figure  3.  illustration  of  the  four  neighborhoods  used  for 
feature  extraction:  (a)  grown  region,  (b)  margin,  (c)  en¬ 
compassing  region,  and  (d)  surrounding  periphery  (cross- 
hatched  region). 


The  spiculation  measure  is  determined  from  an  analy¬ 
sis  of  radial  edge  gradients  (22).  The  spiculation  measure 
evaluates  the  average  angle  (in  degrees)  by  which  the  di¬ 
rection  of  the  maximum  gradient  at  each  point  along  the 
margin  of  a  mass  deviates  from  the  radial  direction,  the 
direction  pointing  from  the  geometric  center  of  the  mass 
to  the  point  on  the  margin.  The  actual  measure  is  the  full 
width  at  half  maximum  (FWHM)  of  the  normalized  edge- 
gradient  distribution  calculated  for  a  neighborhood  of  the 
grown  region  of  the  mass  with  respect  to  the  radial  direc¬ 
tion  (22).  This  measure  is  able  to  quantify  the  degree  of 
spiculation  of  a  mass  primarily  because  the  direction  of 
maximum  gradient  along  the  margin  of  a  spiculated  mass 


Figure  2  ( continued ).  An  experienced  mam- 
mographer's  hand-drawn  margins  of  the  (g) 
malignant  and  (h)  benign  masses. 


Radiologist's  Spiculation  Rating 

Figure  4.  Correlation  of  the  spiculation  measure  (weighted 
FWHM,  in  degrees)  with  the  spiculation  ratings  (Fig  la)  of  an 
experienced  mammographer  for  a  database  of  95  mass  im¬ 
ages.  The  error  bars  indicate  the  variation  in  the  spiculation 
measure  for  each  spiculation  rating  given  by  the  radiologist. 

varies  greatly  from  its  radial  direction,  whereas  the  direc¬ 
tion  of  the  maximum  gradient  along  the  margin  of  a 
smooth  mass  is  similar  to  its  radial  direction.  The  spicula¬ 
tion  measure  was  extracted  not  only  along  and  within  the 
margin  of  a  mass  (Fig  3a,  3b)  but  also  in  enlarged  neigh¬ 
borhoods  of  the  computer-identified  mass  region  as 
shown  in  Figure  3c  and  3d.  In  this  way,  potentially  more 
subtle  spicules  that  are  difficult  to  delineate  by  region 
growing  could  be  better  extracted.  The  two  enlarged 
neighborhoods  included  20  additional  pixels  around  the 
computer-identified  mass  region.  A  neighborhood  of  this 
size  is  large  enough  to  accommodate  thin  or  short  spi¬ 
cules  radiating  from  the  margin  of  a  mass. 
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To  maximize  the  sensitivity  of  the  spiculation  measure, 
all  the  possible  signs  of  spiculation  identified  from  the  four 
neighborhoods  were  considered;  the  greatest  value  of  the 
FWHM  measures  from  the  four  neighborhoods  was  used  to 
indicate  the  spiculation  of  a  mass.  However,  because  of  dif¬ 
ferences  in  the  ability  of  FWHM  measures  from  the  four 
neighborhoods  to  capture  spiculation  information  (22),  the 
FWHM  measures  were  weighted  differently.  The  weighting 
factor  used  for  the  two  enlarged  neighborhoods  was  1 .0,  and 
that  used  for  the  other  two  neighborhoods  was  0.85.  This 
weighted  spiculation  measure  correlates  well  with  an  experi¬ 
enced  mammographer’s  spiculation  rating  (r  =  .64;  P  < 

.0001)  (Fig  4).  In  addition,  the  level  of  performance  of  the 
spiculation  measure  (Az  =  0.88)  was  similar  to  that  of  the  ex¬ 
perienced  mammographer’s  spiculation  ratings  (A  =  0.85) 
in  terms  of  the  ability  to  distinguish  between  benign  and  ma¬ 
lignant  masses  based  solely  on  spiculation  (22). 

The  sharpness  of  the  margin  of  a  mass  can  be  describ¬ 
ed  as  well  defined,  partially  ill  defined,  or  ill  defined. 

The  average  margin  sharpness  can  be  quantified  by  calcu¬ 
lating  the  magnitude  of  the  average  gradient  along  the 
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Figure  5.  Cluster  plot  of  the  spiculation  measure  (weighted 
FWHM)  versus  the  margin  sharpness  measure  (average  gradi¬ 
ent)  along  the  margin  for  95  mass  images.  The  horizontally 
drawn  line  indicates  the  cutoff  on  the  FWHM  measure  cho¬ 
sen  to  distinguish  between  spiculated  and  nonspiculated 
masses. 


Figure  6.  Examples  of  masses  with  (a)  a 
spiculated  margin,  (b)  an  ill-defined  mar¬ 
gin,  (c)  a  partially  ill-defined  margin,  and 
(d)  a  well-defined  margin  shown  by  mam¬ 
mography. 
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Table  1 


Examples  of  Spiculation  and  Margin-Sharpness  Measures  for  Four  Selected  Masses 


Mass  Type 

Radiologist's 
Spiculation  Rating 

FWHM 

Measure 

Average  Gradient 
Along  the  Margin 

Spiculated 

5 

240 

513 

III  defined/obscured 

3 

118 

318 

Partially  ill  defined/obscured 

3 

111 

962 

Wei!  defined 

2 

111 

1,315 

Maximum  value  in  the  database 

5 

242 

1,315 

Minimum  value  in  the  database 

1 

88 

127 

margin  of  the  mass.  A  well-defined  margin  has  a  large 
value  for  the  average  margin  sharpness  measure,  whereas 
an  ill-defined  margin  has  a  small  value. 

Figure  5  shows  the  relationship  between  the  two  margin 
measures  for  the  95  mass  images.  The  horizontally  drawn 
line  indicates  a  cutoff  on  the  FWHM  measure  used  to  cat¬ 
egorize  spiculated  masses  and  nonspiculated  masses.  It 
should  be  noted  that  there  is  much  more  overlap  between 
benign  and  malignant  masses  in  terms  of  margin  sharpness 
than  in  terms  of  margin  spiculation.  With  the  threshold  of 
160°  for  the  spiculation  measure,  most  of  the  malignant 
masses  were  in  the  spiculated  category  (FWHM  >  160°). 

At  this  threshold,  five  of  39  malignant  masses  and  22  of  26 
benign  masses  were  classified  as  nonspiculated.  In  addi¬ 
tion,  in  the  nonspiculated  category,  masses  with  a  higher 
value  for  the  margin-sharpness  measure  tended  to  be  be¬ 
nign.  This  finding  is  in  agreement  with  radiologists’  visual 
perception  in  determining  the  benign  versus  malignant  na¬ 
ture  of  masses.  Thus,  to  determine  the  likelihood  of  malig¬ 
nancy  of  a  mass  based  on  the  two  margin  characteristics 
described,  it  is  more  effective  to  use  first  the  spiculation 
measure  to  identify  spiculated  masses  (which  are  very 
likely  to  be  malignant)  and  to  determine  their  likelihood  of 
malignancy  based  on  their  degree  of  spiculation.  The  mar¬ 
gin-sharpness  measure  can  then  be  used  further  to  deter¬ 
mine  the  likelihood  of  malignancy  of  the  remaining  (ie, 
nonspiculated)  masses. 

Figure  6  shows  examples  of  masses  with  spiculated,  ill- 
defined,  partially  ill-defined,  and  well-defined  margins. 

The  calculated  spiculation  and  margin-sharpness  measures 
for  these  four  masses  are  listed  in  Table  1.  The  spiculated 
mass  (radiologist’s  spiculation  rating  of  5)  had  a  FWHM 
measure  of  240°  in  a  database  with  a  maximum  degree  of 
spiculation  of  242°  and  a  minimum  of  88°.  This  mass  was 
correctly  identified  as  highly  spiculated  and  thus  was  not 
further  analyzed  with  the  margin-sharpness  measure.  The 
three  smoother  masses,  each  with  spiculation  ratings  by  the 


radiologist  of  2  or  3,  had  similar  spiculation  measures  that 
ranged  from  1110  to  118°.  They  were  classified  as  non¬ 
spiculated  masses  (FWHM  <  160°)  and  were  further  evalu¬ 
ated  with  the  margin-sharpness  measure.  The  margin- 
sharpness  measures  of  the  three  masses  were  well  sepa¬ 
rated,  with  the  well-defined  margin  having  the  highest 
value  (sharpness  of  1,315),  the  partially  ill-defined  margin 
having  the  second  highest  value  (sharpness  of  962),  and  the 
ill-defined  margin  having  the  lowest  value  (sharpness  of 
318)  in  a  database  with  margin-sharpness  measures  that 
ranged  from  127  to  1,315.  This  illustrates  the  usefulness  of 
the  margin-sharpness  measure  in  further  discriminating  be¬ 
tween  masses  in  the  nonspiculated  category. 

Density. — Although  the  radiographic  density  of  a  mass 
may  not  by  itself  be  as  powerful  a  predictor  as  the  margin 
features  in  distinguishing  between  benign  and  malignant 
masses,  taken  with  these  features  density  assessment  can 
be  extremely  useful  (4).  The  evaluation  of  the  density  of 
a  mass  is  of  particular  importance  in  diagnosing  circum¬ 
scribed,  lobulated,  indistinct,  or  obscured  masses  (4)  that 
are  not  spiculated. 

To  assess  the  density  of  a  mass  radiographically,  we 
introduced  three  density-related  measures  (average  gray 
level,  contrast,  texture)  that  characterize  different  aspects 
of  the  density  of  a  mass.  These  measures  are  similar  to 
those  used  intuitively  by  radiologists.  Average  gray  level 
is  obtained  by  averaging  the  gray-level  values  of  each 
point  within  the  grown  region  of  a  mass.  Contrast  is  the 
difference  between  the  average  gray  level  of  the  grown 
mass  and  the  average  gray  level  of  the  surrounding  fatty 
areas  (areas  with  gray-level  values  in  the  lower  20%  of 
the  histogram  for  the  total  surrounding  area).  Texture  is 
defined  here  as  the  standard  deviation  of  the  average  gra¬ 
dient  within  a  mass,  and  it  is  used  to  quantify  patterns 
that  arise  from  veins,  trabeculae,  and  other  structures  that 
may  be  visible  through  a  low-density  mass  but  not 
through  a  high-density  mass.  A  mass  of  low  radiographic 
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Figure  7.  Cluster  plots  of  the  FWHM  measure  versus  various 
secondary  features:  (a)  average  gray  value  of  the  extract¬ 
ed  mass,  (b)  gray-value  difference  between  a  mass  and  its 
surrounding  "fatty  area,"  and  (c)  texture  measure.  The  hori¬ 
zontally  drawn  line  indicates  the  selected  threshold  for  the 
FWHM  measure  that  maximally  separated  spiculated  from 
nonspiculated  masses. 


density  should  have  low  values  for  average  gray  level  and 
contrast  and  a  high  value  for  texture,  whereas  a  mass  of 
high  radiographic  density  should  have  high  values  for  av¬ 
erage  gray  level  and  contrast  and  a  low  value  for  texture. 

The  relationships  of  the  three  density  measures  and  the 
spiculation  measure  for  the  95  mass  images  are  shown  in 
Figure  7.  The  drawn  line  indicates  a  cutoff,  based  on  the 
spiculation  measure  (FWHM  of  160°),  that  categorizes 
spiculated  and  nonspiculated  masses.  As  can  be  seen 
from  these  cluster  plots,  the  distribution  of  benign  and 
malignant  nonspiculated  masses  in  terms  of  the  density 
agrees  with  radiologists’  general  perception;  namely,  the 
benign  masses  in  the  nonspiculated  category  tend  to  have 
low  image  density,  whereas  the  malignant  masses  in  the 
nonspiculated  category  tend  to  have  high  image  density 
for  all  three  density  measures.  The  ability  to  separate 
low-density  benign  masses  from  high-density  malignant 
masses  only  in  the  nonspiculated  category  stresses  the 
importance  of  using  the  density  measures  to  differentiate 
between  benign  and  malignant  masses  only  after  having 
excluded  the  spiculated  masses. 

Classification  of  Masses 

The  ability  of  each  individual  computer- extracted  fea¬ 
ture  to  aid  in  the  differentiation  between  benign  and  ma¬ 
lignant  mass  images  was  evaluated  for  the  entire  database 
with  ROC  analysis.  The  calculated  Az  values  are  listed  in 
Table  2.  The  ROC  analysis  shows  that  the  spiculation 
measure  outperformed  the  other  features  in  distinguishing 
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between  benign  and  malignant  masses;  the  Az  value  was 
0.88  for  spiculation  compared  with  0.54-0.65  for  the 
other  four  features.  We  have  therefore  found  that  margin 
spiculation  is  as  important  a  feature  for  the  computerized 
method  as  it  is  for  radiologists. 

After  the  rule  based  on  the  spiculation  measure 
(FWHM  of  160°)  was  applied,  the  ability  of  these  fea¬ 
tures  to  further  distinguish  between  benign  and  malignant 
mass  images  in  the  remaining  database  (nonspiculated) 
was  also  studied  with  ROC  analysis.  The  calculated  A 
values  for  these  features  are  listed  in  Table  2.  As  can  be 
seen  in  Table  2,  the  spiculation  measure  is  no  longer  a 
dominant  feature  in  discriminating  between  benign  and 
malignant  masses  in  the  nonspiculated  category.  The 
other  four  features  perform  better,  however,  in  differenti¬ 
ating  malignant  from  benign  masses  in  the  nonspiculated 
category  than  in  the  complete  database  (ie,  both  spicu¬ 
lated  and  nonspiculated).  This  finding  indicates  the  im¬ 
portance  of  using  these  features  to  differentiate  between 
benign  and  malignant  masses  only  after  the  spiculated 
masses  have  been  excluded. 
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Table  2 

Performances  of  the  Five  Computer-extracted  Features  in 
Distinguishing  between  Benign  and  Malignant  Mass  Images 


Feature 

A  for  All  95 
Mass  Images 

Az  for  36  Mass 
Images  in  the 
Nonspiculated 
Category 

Margin 

Spiculation  (FWHM)* 

0.88 

0.53 

Sharpness* 

0.56 

0.68 

Density 

Average  gray  level* 

0.65 

0.66 

Contrast 

0.59 

0.70 

Texture  measure* 

0.54 

0.71 

*These  features  were  used  as  inputs  in  the  ANN  and  the 
combined  rule-based  ANN  classifiers. 


Three  automated  classifiers  were  investigated  for  the 
task  of  merging  the  computer-extracted  features  into  an 
estimate  of  the  likelihood  of  malignancy:  a  rule-based 
method,  an  ANN,  and  a  hybrid  system.  In  determining 
the  likelihood  of  malignancy  for  the  cases  that  had  both 
the  mediolateral  oblique  and  the  craniocaudal  views,  the 
measurements  obtained  from  both  views  were  considered, 
and  the  view  that  the  computer  estimated  had  a  higher 
likelihood  of  malignancy  was  used  in  the  evaluation.  For 
example,  a  mass  would  be  classified  as  malignant  if  ei¬ 
ther  one  of  the  two  views  showed  suspect  signs  (ie,  either 
one  of  the  FWHM  measures  from  its  two  views  satisfied 
the  cutoff  on  the  FWHM  measure). 

Rule -based  method. — A  rule-based  method  adapts 
knowledge  from  experts  into  a  set  of  simple  rules.  Certain 
criteria  for  differentiating  between  benign  and  malignant 
masses  have  been  established  by  expert  mammographers 
(4,6,32).  The  rules  used  in  our  approach  for  measures  of 
spiculation,  margin  sharpness,  and  density  were  based  on 
these  criteria. 

A  two-step  rule-based  method  was  studied  for  this  da¬ 
tabase.  Because  of  its  clinical  diagnostic  importance,  the 
spiculation  measure  was  applied  first  in  our  rule-based 
method.  After  the  spiculation  measure  (FWHM)  was  ap¬ 
plied  to  identify  spiculated  masses  (including  some  ir¬ 
regular  masses)  and  categorize  them  as  malignant  first,  a 
second  feature  was  applied  to  characterize  further  the 
masses  in  the  nonspiculated  category  as  discussed  in  the 
previous  section.  To  investigate  the  potential  discriminant 
ability  of  the  spiculation  measure  along  with  all  the  pos¬ 
sible  secondary  features,  we  applied  separately  each  of 
the  remaining  four  features — the  margin-sharpness  mea¬ 


sure  and  the  three  density  measures — after  the  spiculation 
measure.  The  threshold  of  the  spiculation  measure 
(FWHM  =  160°)  was  determined  based  on  the  entire  da¬ 
tabase.  The  thresholds  of  the  other  four  features  were  de¬ 
termined  based  on  the  remaining  database  only. 

ANN. — The  ANN  approach  is  very  different  from  the 
rule-based  method.  Instead  of  using  predetermined  em¬ 
pirical  algorithms  based  on  prior  knowledge,  ANNs  are 
able  to  learn  from  examples  and  therefore  can  acquire 
their  own  knowledge  through  learning.  Also,  neural  net¬ 
works  are  capable  of  processing  large  amounts  of  infor¬ 
mation  simultaneously.  Neural  networks  do  not,  however, 
provide  the  user  with  explanations  for  their  decisions  and 
may  not  be  able  to  bring  preexisting  knowledge  into  the 
network. 

Here  we  used  a  conventional  three-layer,  feed-forward 
neural  network  with  a  back-propagation  algorithm,  which 
has  been  used  in  medical  imaging  and  medical  decision 
making  (33,34).  The  structure  of  the  neural  network  in¬ 
cluded  four  input  units  (each  of  which  corresponded  to  a 
computer-extracted  feature),  two  hidden  units,  and  one 
output  unit.  The  four  features  used  as  inputs  to  the  ANN 
were  the  FWHM  measure,  the  margin-sharpness  measure, 
and  two  density  measures  (indicated  by  asterisks  in  Table 
2).  Similar  performances  were  obtained  when  all  three 
density  measures  were  used.  Because  limiting  the  number 
of  input  features  is  critical  in  reducing  the  number  of 
training  samples  needed,  we  kept  the  number  of  inputs  to 
the  ANN  to  a  minimum;  thus,  only  two  density  measures 
were  used. 

To  determine  the  ability  of  our  neural  network  to  gen¬ 
eralize  from  the  training  cases  and  make  diagnoses  for 
cases  that  had  not  been  included  in  the  database,  we  used 
a  round-robin  method,  also  known  as  the  leave-one-out 
method.  In  this  method,  all  but  one  of  the  cases  were  used 
to  train  the  neural  network.  The  single  case  that  was  left 
out  was  used  to  test  the  neural  network.  For  the  cases  that 
had  both  mediolateral  oblique  and  craniocaudal  views, 
both  images  were  left  out  in  the  round-robin  training.  The 
higher  value  of  the  two  from  the  round-robin  test  was  re¬ 
ported  as  the  estimated  likelihood  of  malignancy.  This 
procedure  was  repeated  for  all  the  cases. 

Hybrid  system. — Each  classifier  has  its  advantages  and 
limitations.  With  rule-based  methods,  one  could  adopt 
preexisting  knowledge  as  rules.  There  are  limitations, 
however,  in  the  availability  of  knowledge  and  knowledge 
translation.  Even  the  experts  find  it  difficult  to  articulate 
particular  types  of  “intuitive”  knowledge,  and  the  process 
of  translating  particular  knowledge  into  rules  is  limited 


163 


HUO  ET  AL 


Vol  5,  No  3,  March  1998 


by  this  expressive  power.  ANNs  are  capable  of  learning 
from  examples  and  therefore  can  acquire  their  own 
knowledge.  It  may  be  most  advantageous  to  use  ANNs 
when  intuitive  knowledge  cannot  be  explicitly  expressed 
or  is  difficult  to  translate.  The  ANN  needs  a  sufficiently 
large  database,  however,  to  learn  effectively.  Also,  with 
an  ANN  there  may  be  uncertainty  as  to  whether  the  final 
learning  goal  is  achieved  in  some  situations. 

To  take  advantage  of  both  rule-based  systems  and 
ANNs  in  the  task  of  classifying  masses,  we  integrated  a 
rule-based  method  and  an  ANN  into  a  hybrid  system.  In 
the  hybrid  system,  we  initially  applied  a  rule  on  the  spic- 
ulation  measure  because  both  spiculated  and  irregular 
masses  are  highly  suspect  for  malignancy.  We  then  ap¬ 
plied  an  ANN  to  the  remaining  masses.  Basically,  this 
method  frees  the  ANN  from  having  to  “learn”  the  impor¬ 
tance  of  spiculation  to  the  detriment  of  learning  the  im¬ 
portance  of  the  other  features. 

The  threshold  of  the  spiculation  measure  for  the  hybrid 
system  was  the  same  as  the  one  used  in  the  rule-based 
method.  The  ANN  applied  in  the  hybrid  system  was  a 
three-layer,  feed-forward  neural  network  with  a  back- 
propagation  algorithm  that  had  a  structure  of  three  input 
units  (corresponding  to  the  three  remaining  features  used 
in  the  ANN  method),  two  hidden  units,  and  one  output 
unit.  The  same  round-robin  method  was  applied  to  test 
the  generalization  ability  of  this  neural  network  to  differ¬ 
entiate  between  benign  and  malignant  masses  in  the 
nonspiculated  category. 

Observer  Study 

One  experienced  radiologist  who  specializes  in  mam¬ 
mography  (D.E.W.)  and  five  other  radiologists  with  some 
experience  in  mammography  participated  in  the  observer 
study.  The  experienced  radiologist  characterized  some 
features  of  the  database  after  the  observer  study  was  com¬ 
pleted.  Three  of  the  five  other  radiologists  were  general 
radiologists  from  Europe  participating  in  visiting  fellow¬ 
ships.  At  the  time  of  the  observer  study,  they  had  a  total 
of  6  months  to  2  years  of  experience  in  mammography. 
One  of  the  remaining  two  was  a  fellow  in  mammography 
in  his  3rd  month  of  training.  The  fifth  radiologist  had  3 
months  of  training  in  mammography  (beyond  the  stan¬ 
dard  2  months  of  training  in  residency)  in  a  combination 
fellowship.  In  the  study,  each  observer  was  asked  to  esti¬ 
mate  the  probability  of  malignancy  for  each  of  the  65 
cases  by  using  a  100-point  scale  based  on  the  mammo¬ 
grams  available.  The  performance  of  each  observer  in 
distinguishing  between  benign  and  malignant  masses  was 


Figure  8.  ROC  curves  for  the  performance 
of  an  experienced  mammographer,  five  ra¬ 
diologists,  the  computerized  scheme  with 
ANN  alone,  and  the  computerized  scheme 
with  the  hybrid  system.  ANN4-2- 1  =  ANN  with 
four  input  units,  two  hidden  units,  and  one 
output  unit. 

evaluated  by  using  ROC  analysis.  An  ROC  curve  was  gen¬ 
erated  for  the  five  less  experienced  radiologists  as  a  group 
by  averaging  the  two  binormal  parameters  of  their  indi¬ 
vidual  ROC  curves  (27,28). 


RESULTS 


Because  the  computer  outputs  from  each  individual 
classifier  were  monotonically  correlated  with  the  esti¬ 
mated  probabilities  of  malignancy,  we  were  able  to  eval¬ 
uate  the  ability  of  each  classifier  to  merge  computer-ex¬ 
tracted  features  into  a  correct  estimated  probability  of 
malignancy  based  on  the  computer  output  with  ROC 
analysis.  The  ROC  curves  of  the  ANN  and  hybrid  system 
are  shown  in  Figure  8.  The  Az  and  the  partial  area  index 
0  90Az'  values  of  the  three  classifiers  are  listed  in  Table  3. 
Among  the  three  classifiers,  the  hybrid  system  yielded 
the  highest  Az  and  090Az'  values.  The  specificities  and 
positive  predictive  values  of  the  three  classifiers  at  100% 
sensitivity  were  calculated.  As  shown  in  Table  3,  the  hy¬ 
brid  system  yielded  the  highest  specificity  (69.2%),  the 
two-step  rule-based  method  the  second  highest  (42.3%, 
34.6%,  30.8%,  30.8%),  and  the  ANN  the  third  highest 
(19.2%). 

The  performance  of  the  hybrid  system  was  compared 
with  that  of  the  other  two  types  of  classifiers.  No  statisti¬ 
cally  significant  difference  (P  >  .05)  was  found  for  the  Az 
values  based  on  the  evaluation  from  the  CLABROC  pro¬ 
gram  (35,36).  Statistically  significant  differences  for  the 
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Table  3 

Performances  of  the  Three  Classifiers  in  Distinguishing  between  the  26  Benign  and  the  39  Malignant  Masses 


Positive 

Full  Curve 

Partial  Area  Index 

Classifier 

Specificity* 

Predictive  Value* 

Az 

A  ' 

o.9crV 

P  t 

Two-step  rule-based  system 

(1st  rule  on  FWHM) 

Margin  sharpness 

34.6% 

69.6% 

0.92 

0.59 

0.014 

Average  gray  level 

30.8% 

68.4% 

0.90 

0.45 

0.001 

Contrast 

30.8% 

68.4% 

0.92 

0.58 

0.021 

Texture  measure 

42.3% 

72.2% 

0.92 

0.63 

0.015 

ANN  (4-2-1) 

Hybrid  system  (rule-based 

19.2% 

65.0% 

0.90 

0.40 

0.008 

and  ANN  3-2-1) 

69.2% 

83.0% 

0.94 

0.73 

Note —ANN  4-2-1  =  ANN  with  four  input  units,  two  hidden  units,  and  one  output  unit;  ANN  3-2-1  = 

ANN  with  three  input  units. 

two  hidden  units,  and  one  output  unit. 

*Sensitivity  was  1 00%. 

The  P  values  were  calculated  for  the  difference  in  the 

090A2  between  the  hybrid  system  and  the  other  two  classifiers. 

Table  4 

Performances  of  the  Human  Observers  in  Distinguishing  between  the  26  Benign  and  the  39  Malignant  Masses 

Positive 

Full  Curve 

Partial  Area  index 

Observer 

Specificity* 

Predictive  Value* 

Az 

A  ' 
0.90^z 

A 

3.8% 

60.9% 

0.85 

0.29 

B 

11.5% 

62.9% 

0.86 

0.37 

C 

1 1 .5% 

62.9% 

0.85 

0.40 

D 

0% 

60.0% 

0.70 

0.07 

E 

3.8% 

60.9% 

0.80 

0.27 

Average  performance  of  A-E 

6.1% 

61.5% 

0.81 

0.28 

Experienced  mammographer 

38.5% 

70.9% 

0.91 

0.58 

*  Sensitivity  was  1 00% 


nnnA  '  values  were  found,  however,  at  the  levels  of  the  two- 
tailed  P  values  as  listed  in  Table  3.  Differences  in  positive 
predictive  value  and  specificity  at  100%  sensitivity  be¬ 
tween  the  hybrid  system  and  the  two-step  rule-based 
method  on  average  were  13%  and  34%,  respectively.  Dif¬ 
ferences  in  positive  predictive  value  and  specificity  at 
100%  sensitivity  between  the  hybrid  system  and  the  ANN 
were  18%  and  50%,  respectively. 

The  ability  of  each  radiologist  to  distinguish  between 
benign  and  malignant  masses  was  determined  based  on 
the  radiologist’s  subjective  ratings  of  the  probability  of 
malignancy  for  the  65  cases.  The  ROC  curves  of  the  ex¬ 
perienced  mammographer  and  of  the  average  perfor¬ 
mance  of  the  five  radiologists  are  shown  in  Figure  8. 
Table  4  lists  their  individual  performances  in  terms  of  Az, 
090A/,  positive  predictive  value,  and  specificity  at  100% 
sensitivity.  The  average  performance  for  the  five  radiolo¬ 
gists  was  calculated  (Table  4).  The  experienced  mam¬ 


mographer  had  an  Az  of  0.91,  whereas  the  average  of  the 
five  radiologists  yielded  an  Az  of  0.81.  The  partial  area 
index  090Az'  for  the  experienced  mammographer  was  0.58, 
whereas  the  partial  area  index  090Az'  for  the  five  radiolo¬ 
gists  was  0.28.  Student  t  test  for  paired  data  was  employ¬ 
ed  to  evaluate  the  statistical  significance  of  these  differ¬ 
ences  (16).  Results  showed  the  differences  in  Az  and 
90 A '  to  be  statistically  significant  (two-tailed  P  values  of 

Wand  .006). 

The  ability  of  the  observers  to  distinguish  malignant 
from  benign  masses  was  compared  with  that  of  the  com¬ 
puterized  method  using  the  hybrid  system.  The  differ¬ 
ences  in  Az  and  0  go Az'  between  the  experienced  mammog¬ 
rapher  and  the  computerized  method  were  found  to  be  not 
statistically  significant  (two-tailed  P  values  of  0.38  and 
0.30)  based  on  the  evaluation  from  the  CLABROC  pro¬ 
gram  (35,36)  and  the  modified  version  of  the  CLABROC 
program  (30).  Results  of  the  Student  t  test  for  paired  data 
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showed  that  the  differences  in  A  and  AAnA  '  between  the 
average  performance  of  the  five  radiologists  and  the  com¬ 
puterized  method  were  both  statistically  significant  (two- 
tailed  P  values  of  .0131  and  .0015).  Furthermore,  statisti¬ 
cally  significant  differences  were  found  between  the  two 
in  terms  of  the  positive  predictive  value  and  the  specific¬ 
ity  at  the  100%  sensitivity  level  (two-tailed  P  values  < 
.0001). 

The  differences  in  positive  predictive  value  and  speci¬ 
ficity  at  100%  sensitivity  between  the  average  perfor¬ 
mance  of  the  five  radiologists  and  the  performance  of  the 
experienced  mammographer  were  9%  and  32%,  respec¬ 
tively;  these  differences  were  also  found  to  be  statisti¬ 
cally  significant.  The  differences  in  positive  predictive 
value  and  specificity  at  100%  sensitivity  between  the  av¬ 
erage  performance  of  the  five  radiologists  and  that  of  the 
computerized  scheme  were  even  larger,  21%  (P  <  .0001) 
and  63%  (P  =  .0001),  respectively.  In  other  words,  with 
the  database  we  used,  at  a  100%  sensitivity  level  (ie,  no 
loss  of  malignant  cases),  the  average  radiologists  misclas- 
sified  or  essentially  overcalled  24  of  the  26  benign  cases, 
whereas  the  computer  scheme  misclassified  only  eight  of 
the  26  benign  cases. 


DISCUSSION 


We  have  developed  a  computer  scheme  that  automati¬ 
cally  extracts  features  of  masses  that  are  similar  to  those 
perceived  by  radiologists.  Feature  analysis  has  indicated 
that  our  computer-extracted  features  correlate  well  with 
the  major  features  perceived  by  radiologists,  as  shown  in 
Figures  4,  5,  and  7.  We  have  shown  that  spiculation 
(FWHM  measure)  is  a  dominant  feature  for  analysis  by 
both  radiologists  and  computerized  methods. 

The  shape  of  a  mass  can  be  described  as  regular  or  ir¬ 
regular,  lobulated  or  not  lobulated,  circular  or  ovoid. 
Generally,  shape  is  not  as  important  as  margin  character¬ 
istics  in  the  determination  of  the  benign  versus  malignant 
status  of  a  mass.  An  irregular  shape  can,  however,  be  a 
useful  sign  of  malignancy.  We  did  not  use  a  single  mea¬ 
sure  to  directly  characterize  the  shape  of  a  mass  in  our 
scheme.  However,  one  can  correctly  identify  irregular 
masses  as  suspicious  for  malignancy  based  on  the  spicu¬ 
lation  measure,  because  the  direction  of  the  maximum 
gradient  along  the  margin  of  an  irregularly  shaped  mass 
can  vary  as  greatly  as  that  of  a  spiculated  mass.  A  lobu¬ 
lated  mass  also  has  a  higher  spiculation  value  than  a 
smooth  circular  or  ovoid  mass  because  the  direction  of 
the  maximum  gradient  relative  to  the  radial  direction 


along  the  margin  of  a  lobulated  mass  varies  more  than 
that  along  the  margin  of  a  smooth  mass.  Thus,  a  smooth 
lobulated  mass  will  be  ranked  as  more  suspect  for  malig¬ 
nancy  than  a  smooth  circular  or  ovoid  mass,  similar  to  the 
rank  ordering  radiologists  would  give.  Some  lobulated 
masses  might  be  classified  into  the  spiculated  category  if 
they  were  very  highly  lobulated. 

We  have  studied  three  types  of  classifiers  with  which 
to  merge  the  computer-extracted  features.  The  three  clas¬ 
sifiers  mimic  three  possible  ways  that  radiologists  might 
merge  the  information  that  they  perceive  in  the  task  of 
classification  of  masses.  The  combination  of  the  rule- 
based  method  by  using  the  spiculation  measure  with  the 
ANN  is  probably  the  one  that  serves  this  task  best  for 
several  reasons.  First,  introduction  of  the  well-known  im¬ 
portance  of  spiculation,  which  was  also  shown  here,  into 
our  system  with  a  one-step  rule-based  method  allows  the 
ANN  to  “concentrate”  on  acquiring  its  own  knowledge 
for  the  more  difficult  features  for  which  considerable 
overlap  in  the  appearance  of  benign  and  malignant  mas¬ 
ses  occurs.  Second,  the  good  correlation  between  the 
computer  spiculation  measure  and  an  expert  mammog¬ 
rapher’  s  spiculation  ratings,  as  well  as  the  similar  perfor¬ 
mance  of  the  two  in  distinguishing  between  benign  and 
malignant  masses,  allows  a  reliable  translation  of  the  “in¬ 
tuitive”  knowledge  into  a  simple  rule  in  the  hybrid  sys¬ 
tem.  Third,  in  clinical  practice,  radiologists  are  more 
likely  to  process  the  information  they  perceive  in  the 
same  way  that  is  used  in  our  hybrid  system,  namely,  ex¬ 
amining  for  spiculation,  the  only  truly  diagnostic  fea¬ 
ture  for  malignancy,  first  and  then  analyzing  all  the  pos¬ 
sible  secondary  features  to  determine  the  likelihood  of 
malignancy. 

We  evaluated  the  classifiers  by  using  self-consistency 
and  round-robin  methods.  The  consistency  method 
yielded  Az  values  of  0.92,  0.93,  and  0.98  for  the  two-step 
rule-based  method,  the  four-input  ANN,  and  the  hybrid 
system,  respectively.  The  round-robin  evaluation  was  per¬ 
formed  to  test  the  generalization  ability  of  the  classifiers. 
The  two-step  rule-based  method  was  investigated  only  to 
understand  the  features,  and  so  we  did  not  proceed  with 
round-robin  testing  of  this  method.  The  round-robin 
evaluation  of  the  four-input  ANN  yielded  an  Az  value  of 
0.90.  The  evaluation  of  the  hybrid  system  with  round- 
robin  analysis  on  the  three-input  ANN  yielded  an  Az  of 
0.94.  The  rule  was  set  on  the  spiculation  measure  in  the 
hybrid  system  because  spiculation  is  the  major  feature 
used  intuitively  by  radiologists  in  predicting  malignancy, 
and  this  rule  did  not  undergo  round-robin  analysis.  In  ad- 
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dition,  the  aim  of  the  spiculation  measure  was  not  to 
separate  malignant  masses  completely  from  benign  mas¬ 
ses  but  to  identify  only  those  masses  that  were  very  likely 
to  be  malignant. 

Generalization  of  a  trained  network  is  influenced  by 
three  factors:  the  size  and  efficiency  of  the  training  set, 
the  architecture  of  the  network,  and  the  physical  com¬ 
plexity  of  the  problem.  Round-robin  method  is  one  of  the 
ways  to  validate  a  trained  network  on  a  data  set  different 
from  the  one  used  in  the  training.  A  valid  generalization, 
however,  can  be  guaranteed  only  when  the  training  set 
size  is  sufficiently  large  relative  to  the  architecture  of  the 
network  (37).  We  are  aware  of  the  inadequacies  of  using 
a  finite  database,  which  is  usually  what  is  available  in  the 
medical  field,  and  thus  we  provided  details  about  the 
characteristics  of  the  clinical  database  (Fig  1)  used  in  the 
study. 

Of  the  three  types  of  classifiers,  the  hybrid  system  per¬ 
formed  the  best  in  differentiating  malignant  from  benign 
masses.  One  could  expect  that  the  performance  of  the 
ANN  would  be  similar  to  that  of  the  hybrid  system  be¬ 
cause  they  both  use  the  same  four  features  as  the  inputs. 
Of  the  three  classifiers,  however,  the  ANN  performed  the 
worst  in  terms  of  Az,  090Az',  and  the  performance  at  100% 
sensitivity.  Although  the  difference  in  Az  between  the  hy¬ 
brid  system  and  the  ANN  was  not  statistically  significant 
(two-tailed  P  value  of  .2),  the  difference  in  performance 
between  the  two  classifiers  at  the  high  sensitivities 
(o  90Az')  was  found  to  be  statistically  significant  (two- 
tailed  P  value  of  .008).  This  difference  resulted  from  the 
dominant  nature  of  the  spiculation  measure,  which  kept 
the  ANN-alone  method  from  learning  the  importance  of 
the  other  three  features  in  differentiating  subtle  malignant 
masses  from  benign  masses  (38).  It  seems  that  only  when 
the  ANN  was  used  after  the  spiculation  criterion  did  it 
learn  to  effectively  interpret  the  complicated  interrela¬ 
tionship  among  the  remaining  features  in  determining  the 
benign  or  malignant  status  of  the  subtle  cases.  With  an 
unlimited  database,  the  ANN-alone  method  might  learn 
as  well  as  the  combined  rule-based  ANN  method  in  dis¬ 
tinguishing  between  benign  and  malignant  masses  (both 
spiculated  and  nonspiculated).  Nevertheless,  it  was  still 
more  efficient  to  bring  well-known  knowledge  directly 
into  a  classifier  to  avoid  lengthy  training  times,  the  need 
for  larger  databases,  and  uncertainty  in  whether  the  final 
learning  goal  (well-known  rules)  would  be  achieved. 

The  performances  of  the  expert  mammographer  and 
our  computerized  classification  scheme  were  significantly 
better  than  the  average  performance  of  radiologists  with 


less  mammographic  experience,  and  this  difference  was 
even  greater  at  the  high  sensitivity  levels  (P  values  rang¬ 
ing  from  .013  to  c.OOl).  Variability  in  radiologists’  inter¬ 
pretations  of  mammograms  is  due  to  the  differences  in 
their  knowledge  and  experience  and  has  been  demon¬ 
strated  in  our  observer  study  and  in  other’s  work  (39). 

The  superior  performance  of  the  computerized  classifica¬ 
tion  scheme  in  distinguishing  malignant  masses  from  be¬ 
nign  masses,  especially  at  high  sensitivity  levels,  emu¬ 
lates  the  performance  of  an  expert  mammographer.  This 
finding  underscores  the  potential  usefulness  of  a  com¬ 
puter-aided  diagnosis  classification  scheme  as  an  aid  to 
improving  the  performance  of  less  experienced  mammog- 
raphers  and  thus  reducing  the  variability  among  radiolo¬ 
gists  in  their  mammographic  interpretation  and  reducing 
the  number  of  biopsies  performed  for  benign  masses. 
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